CAI Logo

Spiking Neural Networks for Generalization in RL

Dataset image

Description: While deep Reinforcement Learning has shown promising results in many domains it is still known that RL agents overfit onto the training environment and can't generalize their past experience to new tasks and environments easily [1]. In the past years an increasing number of benchmarks to asses generalization in RL agents have been established [2,3,4,5]. These benchmarks remain challenging despite increasing efforts to leverage compute intense and energy hungry deep learning approaches [6].

Encoding environment states in a high-dimensional representation with Spiking Neural Networks (SNNs) has been shown to improve robustness and efficiency in RL, specifically in navigation environments [7,8,9]. By training a novel encoder that maps environment states to SNN representations end-to-end, this project facilitates robustness in navigation and hence enables generalization across different environments.

Goal: Working on the Generalization Challenge using suitable environments, such as XLand-Minigrid [10], by leveraging the robustness and efficiency of Spiking Neural Networks (SNNs).

Supervisor: Anna Penzkofer and Constantin Ruhdorfer

Distribution: 20% literature review, 60% implementation, 20% analysis

Requirements: Good knowledge of deep learning and reinforcement learning, strong programming skills in Python and PyTorch and/or Jax, interest in cognitive modeling, self management skills. The thesis requires to learn Jax along the way, experience in PyTorch will be sufficient to start.

Literature: [1] Cobbe, K., Klimov, O., Hesse, C., Kim, T., & Schulman, J. (2019, May). Quantifying generalization in reinforcement learning. In International Conference on Machine Learning (pp. 1282-1289). PMLR.

[2] Cobbe, K., Hesse, C., Hilton, J., & Schulman, J. (2020, November). Leveraging procedural generation to benchmark reinforcement learning. In International conference on machine learning (pp. 2048-2056). PMLR.

[3] Jiang, M., Dennis, M., Grefenstette, E., & Rocktäschel, T. (2023). minimax: Efficient Baselines for Autocurricula in JAX. arXiv preprint arXiv:2311.12716.

[4] Jiang, M., Grefenstette, E., & Rocktäschel, T. (2021, July). Prioritized level replay. In International Conference on Machine Learning (pp. 4940-4950). PMLR

[5] Chevalier-Boisvert, M., Dai, B., Towers, M., Perez-Vicente, R. D. L., Willems, L., Lahlou, S., … Terry, J. K. (2023). Minigrid & Miniworld: Modular & Customizable Reinforcement Learning Environments for Goal-Oriented Tasks. Thirty-Seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track.

[6] Bauer, J., Baumli, K., Behbahani, F., Bhoopchand, A., Bradley-Schmieg, N., Chang, M., … Zhang, L. (2023). Human-timescale adaptation in an open-ended task space. Proceedings of the 40th International Conference on Machine Learning.

[7] Banino, A., Barry, C., Uria, B., Blundell, C., Lillicrap, T., Mirowski, P., ... & Kumaran, D. (2018). Vector-based navigation using grid-like representations in artificial agents. Nature, 557(7705), 429-433.

[8] Bartlett, M., Stewart, T. C., & Orchard, J. (2022). Fast Online Reinforcement Learning with Biologically-Based State Representations. In Proceedings of the 20th international conference on cognitive modeling.

[9] Bartlett, M., Simone, K., Dumont, N. D., Furlong, M., Eliasmith, C., Orchard, J., & Stewart, T. (2023). Improving reinforcement learning with biologically motivated continuous state representations. In Proceedings of the 21st International Conference on Cognitive Modeling.

[10] Nikulin, A., Kurenkov, V., Zisman, I., Agarkov, A., Sinii, V., & Kolesnikov, S. (2023). XLand-minigrid: Scalable meta-reinforcement learning environments in JAX. arXiv preprint arXiv:2312.12044.