NVIDIA Nemotron v3 Collection Open, Production-ready Enterprise Models • 12 items • Updated about 19 hours ago • 180
view article Article SmolVLA: Efficient Vision-Language-Action Model trained on Lerobot Community Data +7 Jun 3, 2025 • 333
DynamicVLA: A Vision-Language-Action Model for Dynamic Object Manipulation Paper • 2601.22153 • Published Jan 29 • 73
AgentScope 1.0: A Developer-Centric Framework for Building Agentic Applications Paper • 2508.16279 • Published Aug 22, 2025 • 56
CoFrGeNet: Continued Fraction Architectures for Language Generation Paper • 2601.21766 • Published Jan 29 • 1
One Shot, One Talk: Whole-body Talking Avatar from a Single Image Paper • 2412.01106 • Published Dec 2, 2024 • 24
OmniHuman-1: Rethinking the Scaling-Up of One-Stage Conditioned Human Animation Models Paper • 2502.01061 • Published Feb 3, 2025 • 223
OmniHuman-1.5: Instilling an Active Mind in Avatars via Cognitive Simulation Paper • 2508.19209 • Published Aug 26, 2025 • 42
Can LLMs Generate Human-Like Wayfinding Instructions? Towards Platform-Agnostic Embodied Instruction Synthesis Paper • 2403.11487 • Published Mar 18, 2024 • 1
Style Customization of Text-to-Vector Generation with Image Diffusion Priors Paper • 2505.10558 • Published May 15, 2025 • 16
Perception, Reason, Think, and Plan: A Survey on Large Multimodal Reasoning Models Paper • 2505.04921 • Published May 8, 2025 • 186
Describe Anything Collection Multimodal Large Language Models for Detailed Localized Image and Video Captioning • 7 items • Updated about 21 hours ago • 61