Perceptual User Interfaces Logo
University of Stuttgart Logo

Interpreting Neural Attention in NLP with Human Visual Attention


Description: Recent work has explored integrated gaze data into attention mechanisms for various computer vision and NLP tasks. However, it is still unknown which types of model architectures and for which tasks this human-like attention component is actually helpful.

In addition, recent work in our department has uncovered that the attention weights computed in the pre-trained transformer network, XLNet, does not actually correlate with the human visual attention, yet the model outperforms previous methods on a question answering task.

We then ask the question, do all transformer networks outperform more traditional models such as LSTMs while also having the least similarity to human visual attention? Is there an impact on the language model methods of various pre-trained networks in NLP or is this significant variance to human attention a bi-product of the pre-trained nature of these systems? Do transformer networks used in computer vision tasks also divert from human attention?

In the scope of this project, the student will implement the BeRT transformer network and the GPT-2 transformer network for the task of question answering on a popular benchmark dataset. The student will extend our previous interpretability paper (Sood et al. 2020) by comparing human visual attention to pre-trained transformer network attention on the same reading comprehension QA task — particularly experimenting with extracting attention at various layers and comparing temporal changes to divergence to human attention as the network trains. As the next step, the student will implement their own transformer network, which is not pre-trained, and perform the same analysis.

Supervisor: Ekta Sood

Distribution: 20% Literature, 25% Data processing, 30% Implementation and Experiments, 25% Analysis and Evaluation

Requirements: Interest in machine reading comprehension/question answering tasks, human visual attention, explainability/neural interpretability, and NLP. In addition, experience with machine learning, data processing and statistics. It will be helpful for the student to have experience with Tensorflow and Pytorch.

Literature: Ekta Sood, Simon Tannert, Diego Frassinelli, Andreas Bulling, and Ngoc Thang Vu. 2020. Interpreting attention models with human visual attention in machine reading comprehension. Proceedings of the 2020 ACL SIGNLL Conference on Computational Natural Language Learning (CoNLL).