EmbodiedOneVision: Interleaved Vision-Text-Action Pretraining for General Robot Control
Paper • 2508.21112 • Published • 78
EmbodiedOneVision is a unified framework for multimodal embodied reasoning and robot control, featuring interleaved vision-text-action pretraining.