Perceptual User Interfaces Logo
University of Stuttgart Logo

Improving Common-Sense Reasoning Tasks with a Cognitive Model of Human Visual Attention

Description: Humans have a rich capacity to infer mental states of others by observing their actions. In fact, cognitive scientists have found that even very young infants expect other agents to have object-based goals, to have goals that reflect preferences, to engage in instrumental actions that bring about goals, and to act efficiently towards goals.

The Baby Intentions Benchmark (BIB) is a comprehensive benchmark that captures the generalisability of human reasoning about other agents. It adapts experiments from studies with infants and therefore it adopts their same evaluation paradigm, the Violation of Expectation.

Designing models capable of performing well in these benchmark tasks represents a big step towards AI systems that reason like humans. However, state-of-the-art deep learning models still struggle to capture the 'common sense' knowledge that guides prediction, inference and action in everyday human scenarios. Combining deep learning methods (e.g. CNNs or Transformers) and cognitive models (e.g. EMMA) has the potential to fill this gap, as bridging between cognitive and data driven methods has been shown to be useful for several machine comprehension and visual attention tasks.

Goal: The goal of this thesis is to follow previous works in designing a hybrid method that combines deep learning and cognitive models and, for the first time, applying it to a new domain, i.e. common-sense reasoning, by evaluating it on the BIB. In case of success, publication in a top tier conference is very likely.

Supervisor: Matteo Bortoletto and Ekta Sood

Distribution: 10% literature review, 70% implementation, 20% analysis

Requirements: Good knowledge of deep learning, strong programming skills in Python and PyTorch, self management skills.


Gandhi, Kanishk, et al. 2021. Baby Intuitions Benchmark (BIB): Discerning the goals, preferences, and actions of others. Advances in Neural Information Processing Systems (NeurIPS) 34, p.9963-9976.

Salvucci, Dario D. 2001. An integrated model of eye movements and visual encoding. Cognitive Systems Research, 1(4), p.201-220.

Sood, Ekta, et al. 2020. Improving natural language processing tasks with human gaze-guided neural attention. Advances in Neural Information Processing Systems (NeurIPS) 33, p.6327-6341.