Preference Optimization for Reasoning with Pseudo Feedback
Fangkai Jiao, Geyang Guo, Xingxing Zhang, Nancy F., Shafiq Joty, and Furu Wei. In International Conference on Learning Representations (ICLR-25 (spotlight)) 2025.
PDF BibTex Slides