Papers
arxiv:2601.03743

O-Researcher: An Open Ended Deep Research Model via Multi-Agent Distillation and Agentic RL

Published on Jan 7
Authors:
,
,
,
,
,
,
,
,
,
,
,
,
,
,

Abstract

A multi-agent framework synthesizes high-quality instructional data for open-source large language models, achieving state-of-the-art performance through a two-stage training approach combining supervised fine-tuning and reinforcement learning.

AI-generated summary

The performance gap between closed-source and open-source large language models (LLMs) is largely attributed to disparities in access to high-quality training data. To bridge this gap, we introduce a novel framework for the automated synthesis of sophisticated, research-grade instructional data. Our approach centers on a multi-agent workflow where collaborative AI agents simulate complex tool-integrated reasoning to generate diverse and high-fidelity data end-to-end. Leveraging this synthesized data, we develop a two-stage training strategy that integrates supervised fine-tuning with a novel reinforcement learning method, designed to maximize model alignment and capability. Extensive experiments demonstrate that our framework empowers open-source models across multiple scales, enabling them to achieve new state-of-the-art performance on the major deep research benchmark. This work provides a scalable and effective pathway for advancing open-source LLMs without relying on proprietary data or models.

Community

Sign up or log in to comment

Models citing this paper 2

Datasets citing this paper 2

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2601.03743 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.