Q-RAG: Long Context Multi-Step Retrieval via Value-Based Embedder Training

Q-RAG is a resource-efficient method for multi-step retrieval trained with reinforcement learning directly in the latent space of text-chunk embeddings. Instead of expensive LLM fine-tuning, Q-RAG trains only a lightweight embedder agent using value-based RL (temporal difference learning), keeping the LLM frozen.

This approach achieves state-of-the-art results on long-context benchmarks like BabiLong and RULER for contexts up to 10M tokens, as well as competitive performance on open-domain multi-hop QA benchmarks (HotpotQA, Musique).

Summary

Most existing Retrieval-Augmented Generation (RAG) methods focus on single-step retrieval. Q-RAG proposes fine-tuning the embedder model for multi-step retrieval using reinforcement learning (RL). It offers a competitive, resource-efficient alternative to existing multi-step retrieval methods and maintains performance even as context grows significantly.

Citation

If you find Q-RAG useful, please cite the following work:

@inproceedings{sorokin2026qrag,
  title     = {{Q-RAG}: Long Context Multi-Step Retrieval via Value-Based Embedder Training},
  author    = {Sorokin, Artyom and Buzun, Nazar and Anokhin, Alexander and Inozemcev, Oleg and Vedernikov, Egor and Anokhin, Petr and Burtsev, Mikhail and Trushkov, Alexey and Yin, Wenshuai and Burnaev, Evgeny},
  booktitle = {Proceedings of the International Conference on Learning Representations (ICLR)},
  year      = {2026}
}

@article{sorokin2025qrag,
  title   = {{Q-RAG}: Long Context Multi-Step Retrieval via Value-Based Embedder Training},
  author  = {Sorokin, Artyom and Buzun, Nazar and Anokhin, Alexander and Inozemcev, Oleg and Vedernikov, Egor and Anokhin, Petr and Burtsev, Mikhail and Trushkov, Alexey and Yin, Wenshuai and Burnaev, Evgeny},
  journal = {arXiv preprint arXiv:2511.07328},
  year    = {2025}
}
Downloads last month
7
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for Q-RAG/qrag-ft-e5-on-hotpotqa