UniVA: Universal Video Agent towards Open-Source Next-Generation Video Generalist Paper • 2511.08521 • Published 25 days ago • 37
Black-Box On-Policy Distillation of Large Language Models Paper • 2511.10643 • Published 23 days ago • 46
Depth Anything 3: Recovering the Visual Space from Any Views Paper • 2511.10647 • Published 23 days ago • 92
Music Flamingo: Scaling Music Understanding in Audio Language Models Paper • 2511.10289 • Published 23 days ago • 10
Canvas-to-Image: Compositional Image Generation with Multimodal Controls Paper • 2511.21691 • Published 10 days ago • 32