Preference Optimization for Reasoning with Pseudo Feedback
Fangkai Jiao, Geyang Guo, Xingxing Zhang, Nancy F., Shafiq Joty, and Furu Wei. In International Conference on Learning Representations (ICLR-25 spotlight) 2025.
PDF BibTex Slides