Hard2Verify: A Step-Level Verification Benchmark for Open-Ended Frontier Math
Shrey Pandit, Austin Xu, Xuan-Phi Nguyen, Yifei Ming, Caiming Xiong, and Shafiq Joty. 2025.
PDF BibTex Slides