J4R: Learning to Judge with Equivalent Initial State Group Relative Policy Optimization
Austin Xu, Yilun Zhou, Xuan-Phi Nguyen, Caiming Xiong, and Shafiq Joty. 2025.
PDF BibTex Slides