NUS Multi-Sensor Presentation (NUSMSP) Dataset

Overview · Licence · Download · Contacts · Acknowledgement


[2015-10-19] Dataset is ready


Oral presentation has been an effective method for delivering information to a group of participants for many years. In the past couple of decades, technological advancements have revolutionized the way humans deliver presentations. Unfortunately, due to a variety of reasons, the quality of presentations can be variable which can have an impact on its efficacy. Assessing the quality of a presentation usually requires painstaking manual analysis by experts. The expert feedback can definitely help people improve their presentation skills. Unfortunately, the manual evaluation of the presentation quality by experts is not cost effective and may not available to most people.

In this work, we collected a novel NUS Multi-Sensor Presentation (NUSMSP) Dataset, which contains 51 real-world presentations recorded in a multi-sensor environment. The NUSMSP Dataset was recorded between December 2014 and February 2015 at the National University of Singapore (NUS). The dataset is collected in a meeting room equipped with two static cameras (with built-in microphone), one Kinect depth sensor, and three Google Glasses. This dataset consists of 51 unique individuals (32 males and 19 females). Each subject was asked to prepare and deliver a 10 to 15 minutes presentation with no restriction on the topic. For each recording (presentation), the number of audience members ranged from 4 to 8. In total, we have about 10 hours of valid presentation data. Due to the unpredictable recording conditions, a small portion of data from the sensors failed to record the presentation.

For each presentation, the ambient Kinect depth sensor (denoted as AM-K) captured the speaker's behavior with RGBD data. A high resolution video recording the audiences' behavior was captured using an ambient static camera (denoted as AM-S 2) with a resolution of 1920x1080 at 30fps in MP4 format. Meanwhile, another ambient static camera (denoted as AM-S 2) captured the overview of both the speaker and audiences' behavior with the same specification. The speaker and two randomly chosen audience members were asked to wear a Google Glass. The Google Glass records the video with a resolution of 1280x720 at 30fps in MP4 format. In addition, the standard Android sensor data TYPE_LINEAR_ACCELERATION, TYPE_ACCELEROMETER, TYPE_LIGHT, TYPE_ROTATION_VECTOR, TYPE_MAGNETIC_FIELD, TYPE_GYROSCOPE, TYPE_GRAVITY on the Glass were recorded at 10fps. All the six sensors, except the Kinect depth sensor, have a build-in microphone, which records the audio during the presentation. The synchronization of the five devices with audio data is done by measuring delay between the audio signals through the calculation of cross-correlations. The Kinect depth sensor is synchronized with the rest by a periodic LED visual signal.

Ambient Visual Sensor 1

FPV Sensor 1

Ambient Visual Sensor 2

FPV Sensor 2

Ambient Kinect Sensor 1

FPV Sensor 3

The dataset is manually annotated based on the proposed assessment rubric (see our paper for more details). The full dataset are segmented into multiple clips with a duration of 10 seconds. Each clip are annotated with a minumum of five persons, and annotated label is accepted if and only if more than label is agreed by a minimum of three persons, otherwise additional person was requested to annotate this clip.


This dataset ('Licensed Material') is made available to the scientific community for non-commercial research purposes such as academic research, teaching, scientific publications or personal experimentation. Permission is granted by National University of Singapore (NUS) to you (the 'Licensee') to use, copy and distribute the Licensed Material in accordance with the following terms and conditions:

  1. Licensee must include a reference to NUS and the following publication in any published work that makes use of the Licensed Material:

      T. Gan, Y. Wong, B. Mandal, V. Chandrasekhar, M. Kankanhalli
      Multi-sensor Self-Quantification of Presentations
      ACM Multimedia, pp. 601-610, 2015.

      Bibtex entry:
      	AUTHOR    = {Tian Gan and YongKang Wong and Bappaditya Mandal and Vijay Chandrasekhar and Mohan S. Kankanhalli},
      	TITLE     = {Multi-sensor Self-Quantification of Presentations},
      	BOOKTITLE = {Proceedings of the 23rd Annual ACM Conference on Multimedia Conference},
      	YEAR      = {2015},
      	pages     = {601-610}
      Junnan Li, Y. Wong, M. Kankanhalli
      Multi-stream Deep Learning Framework for Automated Presentation Assessment
      IEEE International Symposium on Multimedia (ISM), 2016.

      Bibtex entry:
      	AUTHOR    = {Li Junnan and YongKang Wong and Mohan S. Kankanhalli},
      	TITLE     = {ulti-stream Deep Learning Framework for Automated Presentation Assessment},
      	BOOKTITLE = {Proceedings of the IEEE International Symposium on Multimedia},
      	YEAR      = {2016}

  2. If Licensee alters the content of the Licensed Material or creates any derivative work, Licensee must include in the altered Licensed Material or derivative work prominent notices to ensure that any recipients know that they are not receiving the original Licensed Material.

  3. Licensee may not use or distribute the Licensed Material or any derivative work for commercial purposes including but not limited to, licensing or selling the Licensed Material or using the Licensed Material for commercial gain.

  4. The Licensed Material is provided 'AS IS', without any express or implied warranties. NUS does not accept any responsibility for errors or omissions in the Licensed Material.

  5. This original license notice must be retained in all copies or derivatives of the Licensed Material.

  6. All rights not expressly granted to the Licensee are reserved by NUS.



  • The NUSMSP dataset taking up about 80 Gb. Each tar.xz file is on average around 13 GB

  • Brief description of the data can be found in README

  • Please download only one file at a time -- this is so the server is not overloaded

  • Microsoft Windows user can extract the *.tar.xz files with 7-Zip

NUSMSP Dataset (Split) [NUSMSP_01.tar.xz]  [NUSMSP_02.tar.xz]  [NUSMSP_03.tar.xz]  [NUSMSP_04.tar.xz]  [NUSMSP_05.tar.xz]  [NUSMSP_06.tar.xz
NUSMSP Dataset (Full) [NUSMSP_full.tar.xz
NUSMSP Annotations [NUSMSP_Annotations.tar.gz ]


If you have any questions regarding to the dataset, please contact:

    {gantian ät u döt nus döt edu} or {yongkang döt wong ät ieee döt org} xyz


    This research was carried out at the NUS-ZJU Sensor-Enhanced Social Media (SeSaMe) Centre. It is supported by the Singapore National Research Foundation under its International Research Centre @ Singapore Funding Initiative and administered by the Interactive Digital Media Programme Office.