Latent Sketchpad: Sketching Visual Thoughts to Elicit Multimodal Reasoning in MLLMs Paper • 2510.24514 • Published Oct 28 • 21
BridgeVLA: Input-Output Alignment for Efficient 3D Manipulation Learning with Vision-Language Models Paper • 2506.07961 • Published Jun 9 • 11