GR-RL: Going Dexterous and Precise for Long-Horizon Robotic Manipulation Paper • 2512.01801 • Published 6 days ago • 22
World Simulation with Video Foundation Models for Physical AI Paper • 2511.00062 • Published Oct 28 • 40
Unified Diffusion VLA: Vision-Language-Action Model via Joint Discrete Denoising Diffusion Process Paper • 2511.01718 • Published Nov 3 • 6
NaviTrace: Evaluating Embodied Navigation of Vision-Language Models Paper • 2510.26909 • Published Oct 30 • 13
MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer Paper • 2509.16197 • Published Sep 19 • 56
FLOWER: Democratizing Generalist Robot Policies with Efficient Vision-Language-Action Flow Policies Paper • 2509.04996 • Published Sep 5 • 13
BridgeVLA: Input-Output Alignment for Efficient 3D Manipulation Learning with Vision-Language Models Paper • 2506.07961 • Published Jun 9 • 11
Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence Paper • 2505.23747 • Published May 29 • 68
Seaweed-7B: Cost-Effective Training of Video Generation Foundation Model Paper • 2504.08685 • Published Apr 11 • 130
FLOWER VLA Collection Collection of checkpoints for the FLOWER VLA policy. A small and versatile VLA for language-conditioned robot manipulation with less than 1B parameter • 10 items • Updated Sep 17 • 4
Open-Sora 2.0: Training a Commercial-Level Video Generation Model in $200k Paper • 2503.09642 • Published Mar 12 • 19
SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features Paper • 2502.14786 • Published Feb 20 • 155