O-Researcher: An Open Ended Deep Research Model via Multi-Agent Distillation and Agentic RL
Abstract
A multi-agent framework synthesizes high-quality instructional data for open-source large language models, achieving state-of-the-art performance through a two-stage training approach combining supervised fine-tuning and reinforcement learning.
The performance gap between closed-source and open-source large language models (LLMs) is largely attributed to disparities in access to high-quality training data. To bridge this gap, we introduce a novel framework for the automated synthesis of sophisticated, research-grade instructional data. Our approach centers on a multi-agent workflow where collaborative AI agents simulate complex tool-integrated reasoning to generate diverse and high-fidelity data end-to-end. Leveraging this synthesized data, we develop a two-stage training strategy that integrates supervised fine-tuning with a novel reinforcement learning method, designed to maximize model alignment and capability. Extensive experiments demonstrate that our framework empowers open-source models across multiple scales, enabling them to achieve new state-of-the-art performance on the major deep research benchmark. This work provides a scalable and effective pathway for advancing open-source LLMs without relying on proprietary data or models.
Models citing this paper 2
Datasets citing this paper 2
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper