publications

publications by categories in reversed chronological order. generated by jekyll-scholar.

2025

  1. Is Attention Required for Transformer Inference? Explore Function-preserving Attention Replacement
    Yuxin Ren, Maxwell D. Collins, Miao Hu, and Huanrui Yang
    arXiv preprint arXiv:2505.21535, 2025
  2. FIER: Fine-Grained and Efficient KV Cache Retrieval for Long-context LLM Inference
    Dongwei Wang, Zijie Liu, Song Wang, Yuxin Ren, Jianing Deng, Jingtong Hu, and 2 more authors
    arXiv preprint arXiv:2508.08256, 2025