Multi-duration Saliency Prediction with Implicit Representation learning
Description: Multi-duration saliency prediction is an important task in understanding temporal human visual attention. Previous methods [2,3] can only predict the saliency map at certain discrete timesteps, for example at 0.5s, 3s, 5s. Implicit neural representation learning shows very strong performance in modeling continuous functions. For example, it has achieved great success in 3d reconstruction, image/video super-resolution, talking face generation, etc. In this project, our goal is to develop a computational model to predict continuous human visual attention with implicit representation learning.
Goal:
* Process the raw gaze data from our internal dataset GazeRecall to obtain the saliency ground truth
* Follow [1] to develop a MLP-CNN based model for multiduration saliency prediction.
* Evaluate the model performance.
Supervisor: Chuhan Jiao and Yao Wang
Distribution: 10% Literature, 15% Data preparation, 60% Implementation, 30% Evaluation
Requirements: Strong programming skills in Python and PyTorch. Interest in implicit neural representation learning.
Literature: [1] Doukas, Michail Christos, Stylianos Ploumpis, and Stefanos Zafeiriou. "Dynamic Neural Portraits." Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 2023.
[2] Fosco, Camilo, et al. "How much time do you have? modeling multi-duration saliency." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020.
[3] Aydemir, Bahar, et al. "TempSAL--Uncovering Temporal Information for Deep Saliency Prediction." arXiv preprint arXiv:2301.02315 (2023).