2024 11 Nsf Irl

A new 3-year grant Inverse Reinforcement Learning with Heterogeneous Data: Estimation Algorithms with Finite Time and Sample Guarantees is awarded by NSF. In this work, we develop theory and algorithms for LLM alignment (e.g., RLHF, DPO, etc) from inverse reinforcement learning perspective.