Discovering the Gems in Early Layers: Accelerating Long-Context LLMs with 1000x Input Token Reduction

[

Discovering the Gems in Early Layers: Accelerating Long-Context LLMs with 1000x Input Token Reduction
Zhenmei Shi, Yifei Ming, Xuan-Phi Nguyen, Yingyu Liang, and Shafiq Joty. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL-26 Findings) 2026.
PDF BibTex Slides