Too Good to be Bad: On the Failure of LLMs to Role-Play Villains Paper • 2511.04962 • Published Nov 7 • 52
Grounding Computer Use Agents on Human Demonstrations Paper • 2511.07332 • Published 29 days ago • 104
DeepAgent: A General Reasoning Agent with Scalable Toolsets Paper • 2510.21618 • Published Oct 24 • 99
LightMem: Lightweight and Efficient Memory-Augmented Generation Paper • 2510.18866 • Published Oct 21 • 110
PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model Paper • 2510.14528 • Published Oct 16 • 105
Winning the Pruning Gamble: A Unified Approach to Joint Sample and Token Pruning for Efficient Supervised Fine-Tuning Paper • 2509.23873 • Published Sep 28 • 67
Vision-Zero: Scalable VLM Self-Improvement via Strategic Gamified Self-Play Paper • 2509.25541 • Published Sep 29 • 140
CDE: Curiosity-Driven Exploration for Efficient Reinforcement Learning in Large Language Models Paper • 2509.09675 • Published Sep 11 • 28
MachineLearningLM: Continued Pretraining Language Models on Millions of Synthetic Tabular Prediction Tasks Scales In-Context ML Paper • 2509.06806 • Published Sep 8 • 63
The Illusion of Diminishing Returns: Measuring Long Horizon Execution in LLMs Paper • 2509.09677 • Published Sep 11 • 34
WebExplorer: Explore and Evolve for Training Long-Horizon Web Agents Paper • 2509.06501 • Published Sep 8 • 79
Falcon-H1: A Family of Hybrid-Head Language Models Redefining Efficiency and Performance Paper • 2507.22448 • Published Jul 30 • 66
Efficient Agents: Building Effective Agents While Reducing Cost Paper • 2508.02694 • Published Jul 24 • 85
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models Paper • 2505.24864 • Published May 30 • 142