Music Flamingo: Scaling Music Understanding in Audio Language Models Paper • 2511.10289 • Published Nov 13, 2025 • 13
Entropy-Adaptive Fine-Tuning: Resolving Confident Conflicts to Mitigate Forgetting Paper • 2601.02151 • Published 17 days ago • 99
CosyEdit: Unlocking End-to-End Speech Editing Capability from Zero-Shot Text-to-Speech Models Paper • 2601.05329 • Published 14 days ago • 1
Avatar Forcing: Real-Time Interactive Head Avatar Generation for Natural Conversation Paper • 2601.00664 • Published 20 days ago • 53
Knot Forcing: Taming Autoregressive Video Diffusion Models for Real-time Infinite Interactive Portrait Animation Paper • 2512.21734 • Published 28 days ago • 4
LiveTalk: Real-Time Multimodal Interactive Video Diffusion via Improved On-Policy Distillation Paper • 2512.23576 • Published 24 days ago • 64
Both Semantics and Reconstruction Matter: Making Representation Encoders Ready for Text-to-Image Generation and Editing Paper • 2512.17909 • Published Dec 19, 2025 • 36
Scaling Behavior of Discrete Diffusion Language Models Paper • 2512.10858 • Published Dec 11, 2025 • 7
Ditto: Motion-Space Diffusion for Controllable Realtime Talking Head Synthesis Paper • 2411.19509 • Published Nov 29, 2024 • 3
Semantics Lead the Way: Harmonizing Semantic and Texture Modeling with Asynchronous Latent Diffusion Paper • 2512.04926 • Published Dec 4, 2025 • 41
TwinFlow: Realizing One-step Generation on Large Models with Self-adversarial Flows Paper • 2512.05150 • Published Dec 3, 2025 • 74
Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length Paper • 2512.04677 • Published Dec 4, 2025 • 168
view article Article SARLO-80: Worldwide Slant SAR Language Optic Dataset at 80 cm Resolution Dec 1, 2025 • 4
Evaluating In Silico Creativity: An Expert Review of AI Chess Compositions Paper • 2510.23772 • Published Oct 27, 2025 • 2