Does Context Matter? ContextualJudgeBench for Evaluating LLM-based Judges in Contextual Settings

[

Does Context Matter? ContextualJudgeBench for Evaluating LLM-based Judges in Contextual Settings
Austin Xu, Srijan Bansal, Yifei Ming, Semih Yavuz, and Shafiq Joty. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (ACL-25) 2025.
PDF BibTex Slides