StructTest: Benchmarking LLMs' Reasoning through Compositional Structured Outputs
Hailin Chen, Fangkai Jiao, Mathieu Ravaut, Nawshad Farruque, Xuan Phi, Chengwei Qin, Manan Dey, Bosheng Ding, Caiming Xiong, Shafiq Joty, and Yingbo Zhou. 2025.
PDF BibTex Slides