2025 Is Attention Required for Transformer Inference? Explore Function-preserving Attention Replacement Yuxin Ren, Maxwell D. Collins, Miao Hu, and Huanrui Yang arXiv preprint arXiv:2505.21535, 2025 DOI PDF FIER: Fine-Grained and Efficient KV Cache Retrieval for Long-context LLM Inference Dongwei Wang, Zijie Liu, Song Wang, Yuxin Ren, Jianing Deng, Jingtong Hu, and 2 more authors arXiv preprint arXiv:2508.08256, 2025 DOI PDF