Iterative Nash Policy Optimization: Aligning LLMs with General Preferences via No-Regret Learning Paper • 2407.00617 • Published Jun 30, 2024 • 7
Offline Learning in Markov Games with General Function Approximation Paper • 2302.02571 • Published Feb 6, 2023
CIDAR: Culturally Relevant Instruction Dataset For Arabic Paper • 2402.03177 • Published Feb 5, 2024 • 6