Meta-Awareness Enhances Reasoning Models: Self-Alignment Reinforcement Learning Paper • 2510.03259 • Published Sep 26 • 57
Hybrid Reinforcement: When Reward Is Sparse, It's Better to Be Dense Paper • 2510.07242 • Published Oct 8 • 30
First Try Matters: Revisiting the Role of Reflection in Reasoning Models Paper • 2510.08308 • Published Oct 9 • 24
Low-probability Tokens Sustain Exploration in Reinforcement Learning with Verifiable Reward Paper • 2510.03222 • Published Oct 3 • 75
Latent Refinement Decoding: Enhancing Diffusion-Based Language Models by Refining Belief States Paper • 2510.11052 • Published Oct 13 • 51
RLFR: Extending Reinforcement Learning for LLMs with Flow Environment Paper • 2510.10201 • Published Oct 11 • 35
QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs Paper • 2510.11696 • Published Oct 13 • 176
Limits of Generalization in RLVR: Two Case Studies in Mathematical Reasoning Paper • 2510.27044 • Published Oct 30 • 5
MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing Paper • 2509.22186 • Published Sep 26 • 136
Towards a Unified View of Large Language Model Post-Training Paper • 2509.04419 • Published Sep 4 • 75
Revolutionizing Reinforcement Learning Framework for Diffusion Large Language Models Paper • 2509.06949 • Published Sep 8 • 55
Beyond the Exploration-Exploitation Trade-off: A Hidden State Approach for LLM Reasoning in RLVR Paper • 2509.23808 • Published Sep 28 • 47