Scaling Text-to-Image Diffusion Transformers with Representation Autoencoders Paper • 2601.16208 • Published 6 days ago • 51
Paper2Rebuttal: A Multi-Agent Framework for Transparent Author Response Assistance Paper • 2601.14171 • Published 8 days ago • 47
Transition Matching Distillation for Fast Video Generation Paper • 2601.09881 • Published 14 days ago • 32
Inference-time Physics Alignment of Video Generative Models with Latent World Models Paper • 2601.10553 • Published 13 days ago • 12
VideoAuto-R1: Video Auto Reasoning via Thinking Once, Answering Twice Paper • 2601.05175 • Published 20 days ago • 34
GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization Paper • 2601.05242 • Published 20 days ago • 210
NextFlow: Unified Sequential Modeling Activates Multimodal Understanding and Generation Paper • 2601.02204 • Published 23 days ago • 60
Digital Twin AI: Opportunities and Challenges from Large Language Models to World Models Paper • 2601.01321 • Published 25 days ago • 18
NitroGen: An Open Foundation Model for Generalist Gaming Agents Paper • 2601.02427 • Published 24 days ago • 43
SOP: A Scalable Online Post-Training System for Vision-Language-Action Models Paper • 2601.03044 • Published 22 days ago • 28
LTX-2: Efficient Joint Audio-Visual Foundation Model Paper • 2601.03233 • Published 22 days ago • 138
Avatar Forcing: Real-Time Interactive Head Avatar Generation for Natural Conversation Paper • 2601.00664 • Published 26 days ago • 56
Avatar Forcing: Real-Time Interactive Head Avatar Generation for Natural Conversation Paper • 2601.00664 • Published 26 days ago • 56
Exploring MLLM-Diffusion Information Transfer with MetaCanvas Paper • 2512.11464 • Published Dec 12, 2025 • 13