Judging the Judges: Can Large Vision-Language Models Fairly Evaluate Chart Comprehension and Reasoning?
Tahmid Laskar, Mohammed Islam, Ridwan Mahbub, Ahmed Masry, Mizanur Rahman, Amran Bhuiyan, Mir Nayeem, Shafiq Joty, Enamul Hoque, and Jimmy Huang. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (ACL-25 (Industry track)) 2025.
PDF BibTex Slides