GeoVista: Web-Augmented Agentic Visual Reasoning for Geolocalization Paper • 2511.15705 • Published 18 days ago • 91
ThinkMorph: Emergent Properties in Multimodal Interleaved Chain-of-Thought Reasoning Paper • 2510.27492 • Published Oct 30 • 81
UME-R1 Collection UME-R1 is a framework designed to endow multimodal embedding models with the flexibility to switch between discriminative and generative embeddings • 4 items • Updated Nov 4 • 8
RoboOmni: Proactive Robot Manipulation in Omni-modal Context Paper • 2510.23763 • Published Oct 27 • 53
LIBERO-Plus: In-depth Robustness Analysis of Vision-Language-Action Models Paper • 2510.13626 • Published Oct 15 • 45
Unfolding Spatial Cognition: Evaluating Multimodal Models on Visual Simulations Paper • 2506.04633 • Published Jun 5 • 19
MM-PRM: Enhancing Multimodal Mathematical Reasoning with Scalable Step-Level Supervision Paper • 2505.13427 • Published May 19 • 26
CPGD: Toward Stable Rule-based Reinforcement Learning for Language Models Paper • 2505.12504 • Published May 18 • 24
CoMP: Continual Multimodal Pre-training for Vision Foundation Models Paper • 2503.18931 • Published Mar 24 • 30
World Modeling Makes a Better Planner: Dual Preference Optimization for Embodied Task Planning Paper • 2503.10480 • Published Mar 13 • 55
MM-Eureka: Exploring Visual Aha Moment with Rule-based Large-scale Reinforcement Learning Paper • 2503.07365 • Published Mar 10 • 61