TEMPO: Scaling Test-time Training for Large Reasoning Models Paper • 2604.19295 • Published 9 days ago • 34
Learning Self-Correction in Vision-Language Models via Rollout Augmentation Paper • 2602.08503 • Published Feb 9 • 3
Learning Self-Correction in Vision-Language Models via Rollout Augmentation Paper • 2602.08503 • Published Feb 9 • 3
Octopus Collection RL checkpoints of Octopus-8B and baselines of paper: Learning Self-Correction in Vision–Language Models via Rollout Augmentation • 6 items • Updated Feb 9