Perceptual User Interfaces

MPIIEmo

As a guideline for our actor's improvisation we provided them with 7 scenarios which could evolve in 4 different ways each ("subscenarios"). A complete list of all scenarios and subscenarios can be found here. Each of the 8 pairs of actors who we recorded performed all (sub)scenarios. Thus, the dataset consists of 224 sequences. There are 8 viewpoints for every sequence, resulting in 1792 video files. We provide the videos in archives seperated by viewpoints. If you want to get an impression of the interactions, it is best to start with viewpoint 2, as it gives a good overview over the environment. The audio in the videos is a simple downmix of all 4 recorded audio channels to mono. We will release the raw audiofiles soon. The structure inside one archive is the following: <id of scenario>_<id of subscenario>_A<id of actor starting in kitchen>_B<id of actor starting outside kitchen>.avi

Download

view1, view2, view3, view4, view5, view6, view7, view8 (~4.5GB each). Supplementary material with list of all scenarios and subscenarios and table of similar datasets. Raw annotations from all 5 annotators.

Contact for data set access: Anna Penzkofer, , for questions: Philipp Müller,

The data is only to be used for non-commercial scientific purposes. If you use this dataset in a scientific publication, please cite the following papers:

Emotion recognition from embedded bodily expressions and speech during dyadic interactions

Philipp Müller, Sikandar Amin, Prateek Verma, Mykhaylo Andriluka, Andreas Bulling

Proc. International Conference on Affective Computing and Intelligent Interaction (ACII), pp. 663-669, 2015.

Abstract Links BibTeX Project

Previous work on emotion recognition from bodily expressions focused on analysing such expressions in isolation, of individuals or in controlled settings, from a single camera view, or required intrusive motion tracking equipment. We study the problem of emotion recognition from bodily expressions and speech during dyadic (person-person) interactions in a real kitchen instrumented with ambient cameras and microphones. We specifically focus on bodily expressions that are embedded in regular interactions and background activities and recorded without human augmentation to increase naturalness of the expressions. We present a human-validated dataset that contains 224 high-resolution, multi-view video clips and audio recordings of emotionally charged interactions between eight couples of actors. The dataset is fully annotated with categorical labels for four basic emotions (anger, happiness, sadness, and surprise) and continuous labels for valence, activation, power, and anticipation provided by five annotators for each actor. We evaluate vision and audio-based emotion recognition using dense trajectories and a standard audio pipeline and provide insights into the importance of different body parts and audio features for emotion recognition.

doi: 10.1109/ACII.2015.7344640

Paper: mueller15_acii.pdf

@inproceedings{mueller15_acii, title = {Emotion recognition from embedded bodily expressions and speech during dyadic interactions}, author = {M{\"{u}}ller, Philipp and Amin, Sikandar and Verma, Prateek and Andriluka, Mykhaylo and Bulling, Andreas}, year = {2015}, pages = {663-669}, doi = {10.1109/ACII.2015.7344640}, booktitle = {Proc. International Conference on Affective Computing and Intelligent Interaction (ACII)} }

Hardware:

The dataset was recorded with a camera system from 4D View Solutions.

Acknowledgements:

The authors would like to thank Johannes Tröger for working as a director in our recordings, as well as all involved actors and annotators.

MPIIEmo

Download

Hardware:

Acknowledgements:

Links

Contact Us