Research Papers
2026
Lupu, E. S., Spieler, P., Javed, K., De Asis, K., Martin, J. D., Steenstrup, M., Modayil, J. (2026). The Open Ant: a robot platform for reinforcement learning research. Under review.
Javed, K., Modayil, J., Kennickell, G., Sutton, R. S., Carmack, J. (2026). Physical Atari: a robust and accessible platform for real-time reinforcement learning on robots. Under review.
Martin, J. D., Mince, F., Saleh, E., Pajak, A. (2026). Artifacts as memory beyond the agent boundary. Under review.
De Asis, K., Elsayed, M., He, J. (2026). Extending differential temporal difference methods for episodic problems. Under review.
Sharifnassab, A., Elsayed, M,, De Asis, K., Mahmood, A. R., Sutton, R. S. (2026). Intentional updates for streaming reinforcement learning. Under review.
2025
Pickett, M., Nain, A. K., Modayil, J., Jones, L. (2025). The ungrounded alignment problem. In ICDL 2025.
Sharifnassab, A., Salehkaleybar, S., Sutton, R. S. (2025). MetaOptimize: a framework for optimizing step sizes and other meta-parameters. In ICML 2025.
2024
De Asis, K., Sutton, R. S. (2024). An idiosyncrasy of time-discretization in reinforcement learning. In RLC 2024.
Sharifnassab, A., Salehkaleybar, S., Ghiassian, S., Kanoria, S., Schuurmans, D. (2024). Soft preference optimization: aligning language models to expert distributions. Pre-print.
