Preference Optimization for Reasoning with Pseudo Feedback
Fangkai Jiao, Geyang Guo, Xingxing Zhang, Nancy Chen, Shafiq Joty, and Furu Wei. In 2024.
BibTex Slides