MDocAgent: A Multi-Modal Multi-Agent Framework for Document Understanding Paper • 2503.13964 • Published Mar 18 • 20
RLinf-VLA: A Unified and Efficient Framework for VLA+RL Training Paper • 2510.06710 • Published Oct 8 • 38
VIDEOP2R: Video Understanding from Perception to Reasoning Paper • 2511.11113 • Published 22 days ago • 111