Bachelor thesis projects
Detecting Personality during Group Interactions from Multimodal Behaviors
Description: This project is to detect personality from multimodal behavioural data within small groups. You will use a dataset (as shown in the figure) that recorded videos when 3 or 4 participants had a group discussion (Müller et al. 2018). Participants also reported their personality by completing a Big Five personality questionnaire. Speech, body movement, posture, eye contact and facial expression can be extracted from the video to predict a participant’s personality via machine learning or deep learning.
Supervisor: Guanhua Zhang
Distribution: 20% Literature, 20% Data Preparation, 40% Implementation, 20% Data Analysis
Requirements: Strong programming skills, experience with machine learning/deep learning and data analysis
Literature: Philipp Müller, Michael Xuelin Huang, and Andreas Bulling. 2018. Detecting Low Rapport During Natural Interactions in Small Groups from Non-Verbal Behaviour. Proceedings of the 23rd International Conference on Intelligent User Interfaces (IUI).
Personality Recognition under Data Scarcity
Description: Personality is stable, which means no matter how much data we collect from a person, he or she always has only one corresponding score for a personality trait that will be the ground-truth label in recognition. For example, we may collect data for hours from a user, but all the samples map to only one value. This may lead to a data scarcity problem, i.e., training a traditional recognition model, especially a regression model, requires the dataset to cover a wide range of score values. In this case, hundreds or even thousands of participants who have different personality scores have to be recruited, increasing the difficulty of data collection and personality research. Therefore, this project will try to handle the problem via data science and machine learning techniques, e.g., data augmentation, embedding, representation learning, unsupervised learning, etc., to effectively predict personality from a small amount of data. Several personality datasets, like AMIGOS (Miranda-Correa et al. 2018) shown in the figure, can be used to test and evaluate your approaches.
Supervisor: Guanhua Zhang
Distribution: 20% Literature, 10% Data Preparation, 50% Implementation, 20% Data Analysis
Requirements: Strong programming skills, experience with machine learning and data analysis
Literature: Juan Abdon Miranda-Correa, Motjaba Khomami Abadi, Nicu Sebe, and Ioannis Patras. 2018. AMIGOS: A dataset for affect, personality and mood research on individuals and groups. IEEE Transactions on Affective Computing.
Interpreting Attention-based Visual Question Answering Models with Multimodal Human Visual Attention
Description: In visual question answering (VQA) the task for the network is to answer questions about a given image and a natural language (Agrawal et al. 2015). VQA is a machine comprehension task and as such allows researchers to test if models are able to learn reasoning capabilities between various modalities. It is a field of interest for researchers from various backgrounds as it intertwines fields of computer vision and natural language processing. Inspired by human visual attention, in recent years many models incorporate various neural attention algorithms. Attention mechanisms supply the network with the ability to focus on a particular instance/element of the input sequence. Subsequently, attention-based networks often yield better results. However, in recent years, performance is not just the main focus. In addition to high performing models, researchers (Das et al. 2016, Sood et al. 2021) are also interested in interpreting neural attention and bridging the gap between neural attention and human visual attention.
To that end, in this project the student will interpret the learned multimodal attention of off-the-shelf VQA models which have performed with SOTA results. The student will extend the work of Sood et al. 2021, by using the same approach for evaluating human versus machine multimodal attention, on additional SOTA attentive VQA models with a focus on the current high performing transformer-based networks.
Supervisor: Ekta Sood
Distribution: 20% Literature, 10% Data Collection, 30% Implementation, 40% Data Analysis and Evaluation.
Requirements: Interest in attention based neural networks, cognitive science, and multimodal representation learning, experience with data analysis/statistics, familiarity with machine learning and exposure to at least one the following frameworks — Tensorflow/PyTorch/Keras.
Literature: Aishwarya Agrawal, Stanislaw Antol, Jiasen Lu, Margaret Mitchell, Dhruv Batra, C. Lawrence Zitnick, and Devi Parikh. 2015. VQA: Visual question answering. arxiv:1505.00468. Retrieved from https://arxiv.org/abs/1505.00468
Abhishek Das, Harsh Agrawal, C. Lawrence Zitnick, Devi Parikh, and Dhruv Batra. 2016. Human attention in visual question answering: Do humans and deep networks look at the same regions? arxiv:1606.03556. Retrieved from https://arxiv.org/abs/1606.03556
Ekta Sood, Fabian Kögel, Florian Strohm, Prajit Dhar, and Andreas Bulling. 2021. VQA-MHUG: A gaze dataset to study multimodal neural attention in VQA. Proceedings of the 2021 ACL SIGNLL Conference on Computational Natural Language Learning (CoNLL).
Predicting Neurological Deficits with Gaze
Description: Gaze patterns in individuals with Autism Spectrum Disorder (ASD) have been heavily researched as atypical eye gaze movements are part of the diagnosis signs. For example, Yaneva et al. (2015) built an ASD corpus extracting gaze data while individuals with ASD performed web-related tasks; the focus on this research was to enhance web accessibility for people with ASD and to work towards ASD detection leveraging gaze information. Regneri et al. (2016) leveraged gaze data for discourse analysis on individuals with and without ASD, their main objective being to evaluate text cohesion between varied groups. Leveraging information from gaze patterns has assisted researchers in further understanding the underlying aspects of language comprehension and production for individuals with neurological deficits such as ASD.
In this project, we aim to utilize machine learning approaches to build on such previous work from psychologists and neuroscientists. Due to the high variability and lack of resources in such a domain, the main objective is to continue building the path towards assistive interactive technologies for individuals with ASD. The task is to use gaze data to classify ASD; the research question is, can we classify individuals generally according to their gaze patterns. As eye tracking data can be sparse and labor intensive to retrieve, particularly for novel groups, we propose to extract the data using OpenFace (Amos et al., 2016). The data can be crawled from, for example, Youtube videos of people with ASD, specifically extracting varied group ages such that we can bin subtypes. The goal here would be to classify ASD with gaze information using a ML approach, and then to further conduct in-depth analysis on variations of gaze patterns specific to variables such as age.
Supervisor: Ekta Sood
Distribution: 20% Literature, 30% Data Collection, 30% Implementation, 20% Data Analysis and Evaluation.
Requirements: Interest in cognitive science (particularly with the human visual perception) and assistive technologies, familiarity with data processing and analysis/statistics, experience with machine learning. It would also be helpful to have exposure to the following frameworks — Tensorflow/PyTorch/Keras.
Literature: Brandon Amos, Bartosz Ludwiczuk, and Mahadev Satyanarayanan. 2016. Openface: A general-purpose face recognition library with mobile applications. CMU School of Computer Science, Technical Report CMU-CS-16-118.
Michaela Regneri and Diane King. 2016. Automated discourse analysis of narrations by adolescents with autistic spectrum disorder. Proceedings of the 7th Workshop on Cognitive Aspects of Computational Language Learning.
Victoria Yaneva, Irina Temnikova, and Ruslan Mitkov. 2015. Accessible texts for autism: An eye-tracking study. Proceedings of the 17th International ACM SIGACCESS Conference on Computers & Accessibility (ASSETS).