-
The Smol Training Playbook
📚3kThe secrets to building world-class LLMs
-
LLM-in-Sandbox Elicits General Agentic Intelligence
Paper • 2601.16206 • Published • 84 -
EvoCUA: Evolving Computer Use Agents via Learning from Scalable Synthetic Experience
Paper • 2601.15876 • Published • 90 -
BigCodeArena: Unveiling More Reliable Human Preferences in Code Generation via Execution
Paper • 2510.08697 • Published • 39
Collections
Discover the best community collections!
Collections including paper arxiv:2510.08697
-
THOR: Tool-Integrated Hierarchical Optimization via RL for Mathematical Reasoning
Paper • 2509.13761 • Published • 16 -
Knapsack RL: Unlocking Exploration of LLMs via Optimizing Budget Allocation
Paper • 2509.25849 • Published • 48 -
Reactive Transformer (RxT) -- Stateful Real-Time Processing for Event-Driven Reactive Language Models
Paper • 2510.03561 • Published • 25 -
Less is More: Recursive Reasoning with Tiny Networks
Paper • 2510.04871 • Published • 508
-
The Debugging Decay Index: Rethinking Debugging Strategies for Code LLMs
Paper • 2506.18403 • Published • 3 -
ReCode: Updating Code API Knowledge with Reinforcement Learning
Paper • 2506.20495 • Published • 10 -
SWE-Debate: Competitive Multi-Agent Debate for Software Issue Resolution
Paper • 2507.23348 • Published • 12 -
LoCoBench: A Benchmark for Long-Context Large Language Models in Complex Software Engineering
Paper • 2509.09614 • Published • 7
-
Vibe Coding vs. Agentic Coding: Fundamentals and Practical Implications of Agentic AI
Paper • 2505.19443 • Published • 15 -
Skywork-SWE: Unveiling Data Scaling Laws for Software Engineering in LLMs
Paper • 2506.19290 • Published • 53 -
CodeNet: A Large-Scale AI for Code Dataset for Learning a Diversity of Coding Tasks
Paper • 2105.12655 • Published -
StarCoder 2 and The Stack v2: The Next Generation
Paper • 2402.19173 • Published • 152
-
BigCodeArena
🚀37Compare two AI models by sending them code and seeing their responses
-
BigCodeArena: Unveiling More Reliable Human Preferences in Code Generation via Execution
Paper • 2510.08697 • Published • 39 -
bigcode/bigcodearena-raw-14k
Viewer • Updated • 14.1k • 29 • 2 -
bigcode/bigcodearena-preference-5k
Viewer • Updated • 4.73k • 62 • 1
-
lusxvr/nanoVLM-222M
Image-Text-to-Text • 0.2B • Updated • 83 • 98 -
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 38 -
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper • 2505.24863 • Published • 97 -
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning
Paper • 2505.17667 • Published • 88
-
The Smol Training Playbook
📚3kThe secrets to building world-class LLMs
-
LLM-in-Sandbox Elicits General Agentic Intelligence
Paper • 2601.16206 • Published • 84 -
EvoCUA: Evolving Computer Use Agents via Learning from Scalable Synthetic Experience
Paper • 2601.15876 • Published • 90 -
BigCodeArena: Unveiling More Reliable Human Preferences in Code Generation via Execution
Paper • 2510.08697 • Published • 39
-
Vibe Coding vs. Agentic Coding: Fundamentals and Practical Implications of Agentic AI
Paper • 2505.19443 • Published • 15 -
Skywork-SWE: Unveiling Data Scaling Laws for Software Engineering in LLMs
Paper • 2506.19290 • Published • 53 -
CodeNet: A Large-Scale AI for Code Dataset for Learning a Diversity of Coding Tasks
Paper • 2105.12655 • Published -
StarCoder 2 and The Stack v2: The Next Generation
Paper • 2402.19173 • Published • 152
-
THOR: Tool-Integrated Hierarchical Optimization via RL for Mathematical Reasoning
Paper • 2509.13761 • Published • 16 -
Knapsack RL: Unlocking Exploration of LLMs via Optimizing Budget Allocation
Paper • 2509.25849 • Published • 48 -
Reactive Transformer (RxT) -- Stateful Real-Time Processing for Event-Driven Reactive Language Models
Paper • 2510.03561 • Published • 25 -
Less is More: Recursive Reasoning with Tiny Networks
Paper • 2510.04871 • Published • 508
-
BigCodeArena
🚀37Compare two AI models by sending them code and seeing their responses
-
BigCodeArena: Unveiling More Reliable Human Preferences in Code Generation via Execution
Paper • 2510.08697 • Published • 39 -
bigcode/bigcodearena-raw-14k
Viewer • Updated • 14.1k • 29 • 2 -
bigcode/bigcodearena-preference-5k
Viewer • Updated • 4.73k • 62 • 1
-
The Debugging Decay Index: Rethinking Debugging Strategies for Code LLMs
Paper • 2506.18403 • Published • 3 -
ReCode: Updating Code API Knowledge with Reinforcement Learning
Paper • 2506.20495 • Published • 10 -
SWE-Debate: Competitive Multi-Agent Debate for Software Issue Resolution
Paper • 2507.23348 • Published • 12 -
LoCoBench: A Benchmark for Long-Context Large Language Models in Complex Software Engineering
Paper • 2509.09614 • Published • 7
-
lusxvr/nanoVLM-222M
Image-Text-to-Text • 0.2B • Updated • 83 • 98 -
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 38 -
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper • 2505.24863 • Published • 97 -
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning
Paper • 2505.17667 • Published • 88