publications

publications by categories in reversed chronological order. generated by jekyll-scholar.

2025

Is Attention Required for Transformer Inference? Explore Function-preserving Attention Replacement

Yuxin Ren, Maxwell D. Collins, Miao Hu, and Huanrui Yang

arXiv preprint arXiv:2505.21535, 2025

DOI PDF
FIER: Fine-Grained and Efficient KV Cache Retrieval for Long-context LLM Inference

Dongwei Wang, Zijie Liu, Song Wang, Yuxin Ren, Jianing Deng, Jingtong Hu, and 2 more authors

arXiv preprint arXiv:2508.08256, 2025

DOI PDF