Research Papers

De Asis, K., Elsayed, M., He, J. (2026). Extending differential temporal difference methods for episodic problems. In RLC 2026.
Javed, K., Modayil, J., Kennickell, G., Sutton, R. S., Carmack, J. (2026). Physical Atari: a robust and accessible platform for real-time reinforcement learning on robots. In RLC 2026.
Lupu, E.-S., Spieler, P., Javed, K., De Asis, K., Martin, J. D., Steenstrup, M., Modayil, J. (2026). The Open Ant: A robot platform for reinforcement learning research. In RLC 2026.
Martin, J. D., Mince, F., Saleh, E., Pajak, A. (2026). Artifacts as memory beyond the agent boundary. Pre-print.
Ono M., Selva D., Cable M.L., Ethvignot M., Hansen M., Hein A.M., Lupu E.-S., Manchester Z., Murrow D., Pozarycki C., Spino P., Stockton A., Choukroun M., Chung S.-J., Day J., Demagall A., Freeman A., Gentgen C., Ingham M.D., Phillips-Lander C.M., Rieber R., Salado A., Sakovsky M., Shiraishi L.R., Yue Y., Zacny K. (2026). Planetary exploration 3.0: A roadmap for software-defined, radically adaptive space systems. In AIAA ASCEND 2026.
Sharifnassab, A., Elsayed, M,, De Asis, K., Mahmood, A. R., Sutton, R. S. (2026). Intentional updates for streaming reinforcement learning. In ICML 2026.

Pickett, M., Nain, A. K., Modayil, J., Jones, L. (2025). The ungrounded alignment problem. In ICDL 2025.
Sharifnassab, A., Salehkaleybar, S., Sutton, R. S. (2025). MetaOptimize: a framework for optimizing step sizes and other meta-parameters. In ICML 2025.

De Asis, K., Sutton, R. S. (2024). An idiosyncrasy of time-discretization in reinforcement learning. In RLC 2024.
Sharifnassab, A., Salehkaleybar, S., Ghiassian, S., Kanoria, S., Schuurmans, D. (2024). Soft preference optimization: aligning language models to expert distributions. Pre-print.

Sharifnassab, A., Freeman, A., Du, H., Aminmansour, F., Sutton, R. S. Meta-descent normalization in continual learning.
Sharifnassab, A., Sutton, R. S. Tabular TD as local optimization in L1 geometry.