Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability Paper • 2604.06628 • Published 26 days ago • 323
GrandCode: Achieving Grandmaster Level in Competitive Programming via Agentic Reinforcement Learning Paper • 2604.02721 • Published Apr 3 • 375
DataFlex: A Unified Framework for Data-Centric Dynamic Training of Large Language Models Paper • 2603.26164 • Published Mar 27 • 364
DR-Venus: Towards Frontier Edge-Scale Deep Research Agents with Only 10K Open Data Paper • 2604.19859 • Published 13 days ago • 51
SWE-chat: Coding Agent Interactions From Real Users in the Wild Paper • 2604.20779 • Published 12 days ago • 14
LLaDA2.0-Uni: Unifying Multimodal Understanding and Generation with Diffusion Large Language Model Paper • 2604.20796 • Published 12 days ago • 239
Agent-World: Scaling Real-World Environment Synthesis for Evolving General Agent Intelligence Paper • 2604.18292 • Published 14 days ago • 82
WebCompass: Towards Multimodal Web Coding Evaluation for Code Language Models Paper • 2604.18224 • Published 14 days ago • 22
ClawEnvKit: Automatic Environment Generation for Claw-Like Agents Paper • 2604.18543 • Published 14 days ago • 27
Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe Paper • 2604.13016 • Published 20 days ago • 88
SkillClaw: Let Skills Evolve Collectively with Agentic Evolver Paper • 2604.08377 • Published 25 days ago • 289
MARS: Enabling Autoregressive Models Multi-Token Generation Paper • 2604.07023 • Published 26 days ago • 38
ClawArena: Benchmarking AI Agents in Evolving Information Environments Paper • 2604.04202 • Published 29 days ago • 37
MiroEval: Benchmarking Multimodal Deep Research Agents in Process and Outcome Paper • 2603.28407 • Published Mar 30 • 70
FIPO: Eliciting Deep Reasoning with Future-KL Influenced Policy Optimization Paper • 2603.19835 • Published Mar 20 • 349
WildWorld: A Large-Scale Dataset for Dynamic World Modeling with Actions and Explicit State toward Generative ARPG Paper • 2603.23497 • Published Mar 24 • 91
Omni-WorldBench: Towards a Comprehensive Interaction-Centric Evaluation for World Models Paper • 2603.22212 • Published Mar 23 • 126
HopChain: Multi-Hop Data Synthesis for Generalizable Vision-Language Reasoning Paper • 2603.17024 • Published Mar 17 • 109
ProRL Agent: Rollout-as-a-Service for RL Training of Multi-Turn LLM Agents Paper • 2603.18815 • Published Mar 19 • 14