CoInteract: Physically-Consistent Human-Object Interaction Video Synthesis via Spatially-Structured Co-Generation Paper • 2604.19636 • Published 2 days ago • 72
HDR Video Generation via Latent Alignment with Logarithmic Encoding Paper • 2604.11788 • Published 10 days ago • 6
HY-World 2.0: A Multi-Modal World Model for Reconstructing, Generating, and Simulating 3D Worlds Paper • 2604.14268 • Published 8 days ago • 107
Seedance 2.0: Advancing Video Generation for World Complexity Paper • 2604.14148 • Published 8 days ago • 150
OmniShow: Unifying Multimodal Conditions for Human-Object Interaction Video Generation Paper • 2604.11804 • Published 10 days ago • 70
Matrix-Game 3.0: Real-Time and Streaming Interactive World Model with Long-Horizon Memory Paper • 2604.08995 • Published 13 days ago • 47
LongCat-Next: Lexicalizing Modalities as Discrete Tokens Paper • 2603.27538 • Published 25 days ago • 144
PackForcing: Short Video Training Suffices for Long Video Sampling and Long Context Inference Paper • 2603.25730 • Published 27 days ago • 53
ShotStream: Streaming Multi-Shot Video Generation for Interactive Storytelling Paper • 2603.25746 • Published 27 days ago • 155
Speed by Simplicity: A Single-Stream Architecture for Fast Audio-Video Generative Foundation Model Paper • 2603.21986 • Published about 1 month ago • 123
Versatile Editing of Video Content, Actions, and Dynamics without Training Paper • 2603.17989 • Published Mar 18 • 17
SAMA: Factorized Semantic Anchoring and Motion Alignment for Instruction-Guided Video Editing Paper • 2603.19228 • Published Mar 19 • 68
WorldCam: Interactive Autoregressive 3D Gaming Worlds with Camera Pose as a Unifying Geometric Representation Paper • 2603.16871 • Published Mar 17 • 60
Grounding World Simulation Models in a Real-World Metropolis Paper • 2603.15583 • Published Mar 16 • 153
OmniForcing: Unleashing Real-time Joint Audio-Visual Generation Paper • 2603.11647 • Published Mar 12 • 31
ShotVerse: Advancing Cinematic Camera Control for Text-Driven Multi-Shot Video Creation Paper • 2603.11421 • Published Mar 12 • 34
EffectMaker: Unifying Reasoning and Generation for Customized Visual Effect Creation Paper • 2603.06014 • Published Mar 6 • 9
Penguin-VL: Exploring the Efficiency Limits of VLM with LLM-based Vision Encoders Paper • 2603.06569 • Published Mar 6 • 119