Q-RAG: Long Context Multi-Step Retrieval via Value-Based Embedder Training
Q-RAG is a resource-efficient method for multi-step retrieval trained with reinforcement learning directly in the latent space of text-chunk embeddings. Instead of expensive LLM fine-tuning, Q-RAG trains only a lightweight embedder agent using value-based RL (temporal difference learning), keeping the LLM frozen.
This approach achieves state-of-the-art results on long-context benchmarks like BabiLong and RULER for contexts up to 10M tokens, as well as competitive performance on open-domain multi-hop QA benchmarks (HotpotQA, Musique).
- Paper: Q-RAG: Long Context Multi-step Retrieval via Value-based Embedder Training
- Repository: GitHub - griver/Q-RAG
- Oral Presentation: ICLR 2026
Summary
Most existing Retrieval-Augmented Generation (RAG) methods focus on single-step retrieval. Q-RAG proposes fine-tuning the embedder model for multi-step retrieval using reinforcement learning (RL). It offers a competitive, resource-efficient alternative to existing multi-step retrieval methods and maintains performance even as context grows significantly.
Citation
If you find Q-RAG useful, please cite the following work:
@inproceedings{sorokin2026qrag,
title = {{Q-RAG}: Long Context Multi-Step Retrieval via Value-Based Embedder Training},
author = {Sorokin, Artyom and Buzun, Nazar and Anokhin, Alexander and Inozemcev, Oleg and Vedernikov, Egor and Anokhin, Petr and Burtsev, Mikhail and Trushkov, Alexey and Yin, Wenshuai and Burnaev, Evgeny},
booktitle = {Proceedings of the International Conference on Learning Representations (ICLR)},
year = {2026}
}
@article{sorokin2025qrag,
title = {{Q-RAG}: Long Context Multi-Step Retrieval via Value-Based Embedder Training},
author = {Sorokin, Artyom and Buzun, Nazar and Anokhin, Alexander and Inozemcev, Oleg and Vedernikov, Egor and Anokhin, Petr and Burtsev, Mikhail and Trushkov, Alexey and Yin, Wenshuai and Burnaev, Evgeny},
journal = {arXiv preprint arXiv:2511.07328},
year = {2025}
}
- Downloads last month
- 7