MPIIGaze: Real-World Dataset and Deep Appearance-Based Gaze Estimation

Xucong Zhang, Yusuke Sugano, Mario Fritz, Andreas Bulling

IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 41(1), pp. 162-175, 2019.

Overview of GazeNet – appearance-based gaze estimation using a deep convolutional neural network (CNN).

Abstract

Learning-based methods are believed to work well for unconstrained gaze estimation, i.e. gaze estimation from a monocular RGB camera without assumptions regarding user, environment, or camera. However, current gaze datasets were collected under laboratory conditions and methods were not evaluated across multiple datasets. Our work makes three contributions towards addressing these limitations. First, we present the MPIIGaze dataset, which contains 213,659 full face images and corresponding ground-truth gaze positions collected from 15 users during everyday laptop use over several months. An experience sampling approach ensured continuous gaze and head poses and realistic variation in eye appearance and illumination. To facilitate cross-dataset evaluations, 37,667 images were manually annotated with eye corners, mouth corners, and pupil centres. Second, we present an extensive evaluation of state-of-the-art gaze estimation methods on three current datasets, including MPIIGaze. We study key challenges including target gaze range, illumination conditions, and facial appearance variation. We show that image resolution and the use of both eyes affect gaze estimation performance, while head pose and pupil centre information are less informative. Finally, we propose GazeNet, the first deep appearance-based gaze estimation method. GazeNet improves on the state of the art by 22% (from a mean error of 13.9 degrees to 10.8 degrees) for the most challenging cross-dataset evaluation.

Links

doi: 10.1109/TPAMI.2017.2778103

Paper: zhang19_pami.pdf

BibTeX

@article{zhang19_pami, title = {MPIIGaze: Real-World Dataset and Deep Appearance-Based Gaze Estimation}, author = {Zhang, Xucong and Sugano, Yusuke and Fritz, Mario and Bulling, Andreas}, year = {2019}, journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)}, doi = {10.1109/TPAMI.2017.2778103}, pages = {162-175}, volume = {41}, number = {1} }

Acknowledgements

We would like to thank Laura Sesma for her help with the dataset handling and normalisation. This work was funded, in part, by the Cluster of Excellence on Multimodal Computing and Interaction (MMCI) at Saarland University, Germany, an Alexander von Humboldt Postdoctoral Fellowship, Germany, and a JST CREST Research Grant (JPMJCR14E1), Japan.