RAD-2: Scaling Reinforcement Learning in a Generator-Discriminator Framework Paper • 2604.15308 • Published 22 days ago • 29
Video-MME-v2: Towards the Next Stage in Benchmarks for Comprehensive Video Understanding Paper • 2604.05015 • Published Apr 6 • 235
UniDriveVLA: Unifying Understanding, Perception, and Action Planning for Autonomous Driving Paper • 2604.02190 • Published Apr 2 • 29
Spa3R: Predictive Spatial Field Modeling for 3D Visual Reasoning Paper • 2602.21186 • Published Feb 24 • 5
DiffusionVL: Translating Any Autoregressive Models into Diffusion Vision Language Models Paper • 2512.15713 • Published Dec 17, 2025 • 18
DiffusionVL: Translating Any Autoregressive Models into Diffusion Vision Language Models Paper • 2512.15713 • Published Dec 17, 2025 • 18
Towards Scalable Pre-training of Visual Tokenizers for Generation Paper • 2512.13687 • Published Dec 15, 2025 • 106
InfiniteVL: Synergizing Linear and Sparse Attention for Highly-Efficient, Unlimited-Input Vision-Language Models Paper • 2512.08829 • Published Dec 9, 2025 • 21
InfiniteVL: Synergizing Linear and Sparse Attention for Highly-Efficient, Unlimited-Input Vision-Language Models Paper • 2512.08829 • Published Dec 9, 2025 • 21
InfiniteVL: Synergizing Linear and Sparse Attention for Highly-Efficient, Unlimited-Input Vision-Language Models Paper • 2512.08829 • Published Dec 9, 2025 • 21
QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs Paper • 2510.11696 • Published Oct 13, 2025 • 182
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency Paper • 2508.18265 • Published Aug 25, 2025 • 220