WorldMM: Dynamic Multimodal Memory Agent for Long Video Reasoning Paper • 2512.02425 • Published 5 days ago • 22
Reasoning via Video: The First Evaluation of Video Models' Reasoning Abilities through Maze-Solving Tasks Paper • 2511.15065 • Published 19 days ago • 74
Back to Basics: Let Denoising Generative Models Denoise Paper • 2511.13720 • Published 20 days ago • 63
Adaptive Multi-Agent Response Refinement in Conversational Systems Paper • 2511.08319 • Published 26 days ago • 40
Diffusion-SDPO: Safeguarded Direct Preference Optimization for Diffusion Models Paper • 2511.03317 • Published Nov 5 • 6
WithAnyone: Towards Controllable and ID Consistent Image Generation Paper • 2510.14975 • Published Oct 16 • 84
OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM Paper • 2510.15870 • Published Oct 17 • 89
NANO3D: A Training-Free Approach for Efficient 3D Editing Without Masks Paper • 2510.15019 • Published Oct 16 • 63
TAG:Tangential Amplifying Guidance for Hallucination-Resistant Diffusion Sampling Paper • 2510.04533 • Published Oct 6 • 47
Multimodal Prompt Optimization: Why Not Leverage Multiple Modalities for MLLMs Paper • 2510.09201 • Published Oct 10 • 49
Reinforcing Diffusion Models by Direct Group Preference Optimization Paper • 2510.08425 • Published Oct 9 • 11
Self-Forcing++: Towards Minute-Scale High-Quality Video Generation Paper • 2510.02283 • Published Oct 2 • 95
ACON: Optimizing Context Compression for Long-horizon LLM Agents Paper • 2510.00615 • Published Oct 1 • 32
OmniInsert: Mask-Free Video Insertion of Any Reference via Diffusion Transformer Models Paper • 2509.17627 • Published Sep 22 • 66