Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2504.07491

Reinforcement Pre-Training

Paper • 2506.08007 • Published Jun 9 • 263
A Survey on Latent Reasoning

Paper • 2507.06203 • Published Jul 8 • 93
Language Models are Few-Shot Learners

Paper • 2005.14165 • Published May 28, 2020 • 18
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Paper • 1910.10683 • Published Oct 23, 2019 • 15

Kimi-VL Technical Report

Paper • 2504.07491 • Published Apr 10 • 132

academic papers

One-Minute Video Generation with Test-Time Training

Paper • 2504.05298 • Published Apr 7 • 110
Slow-Fast Architecture for Video Multi-Modal Large Language Models

Paper • 2504.01328 • Published Apr 2 • 7
Kimi-VL Technical Report

Paper • 2504.07491 • Published Apr 10 • 132

SmolVLM: Redefining small and efficient multimodal models

Paper • 2504.05299 • Published Apr 7 • 200
Kimi-VL Technical Report

Paper • 2504.07491 • Published Apr 10 • 132

Gemma 3 Technical Report

Paper • 2503.19786 • Published Mar 25 • 54
Kimi-VL Technical Report

Paper • 2504.07491 • Published Apr 10 • 132
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

Paper • 2504.10479 • Published Apr 14 • 303
FUSION: Fully Integration of Vision-Language Representations for Deep Cross-Modal Understanding

Paper • 2504.09925 • Published Apr 14 • 38

Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models

Paper • 2506.05176 • Published Jun 5 • 74
Advancing Multimodal Reasoning: From Optimized Cold Start to Staged Reinforcement Learning

Paper • 2506.04207 • Published Jun 4 • 48
MiMo-VL Technical Report

Paper • 2506.03569 • Published Jun 4 • 80
UniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation

Paper • 2506.03147 • Published Jun 3 • 58

Vision Language Models: 2025 Update

This collection includes all the models, datasets and Spaces mentioned in the blog Vision Language Models: 2025 Update

Qwen/Qwen2.5-Omni-7B

Any-to-Any • 11B • Updated Apr 30 • 143k • 1.83k
Running

Featured

363

Qwen2.5 Omni 7B Demo

🏆

363

Generate text and speech from text, audio, images, and videos
Qwen2.5-Omni Technical Report

Paper • 2503.20215 • Published Mar 26 • 166
openbmb/MiniCPM-o-2_6

Any-to-Any • 9B • Updated Oct 5 • 100k • 1.27k

Kimi-VL Technical Report

Paper • 2504.07491 • Published Apr 10 • 132

Moonshot's efficient MoE VLMs, exceptional on agent, long-context, and thinking

Running on Zero

Featured

178

Chat with Kimi-VL-A3B-Thinking-2506

🤔

178

Chat with images, videos, or PDFs to generate text
moonshotai/Kimi-VL-A3B-Thinking-2506

Image-Text-to-Text • 16B • Updated Aug 18 • 151k • 325
moonshotai/Kimi-VL-A3B-Instruct

Image-Text-to-Text • 16B • Updated Jul 30 • 124k • 242
moonshotai/Kimi-VL-A3B-Thinking

Image-Text-to-Text • 16B • Updated Aug 18 • 42.3k • 442

model-base-structure

Transformers without Normalization

Paper • 2503.10622 • Published Mar 13 • 171
Kimi-VL Technical Report

Paper • 2504.07491 • Published Apr 10 • 132

Reinforcement Pre-Training

Paper • 2506.08007 • Published Jun 9 • 263
A Survey on Latent Reasoning

Paper • 2507.06203 • Published Jul 8 • 93
Language Models are Few-Shot Learners

Paper • 2005.14165 • Published May 28, 2020 • 18
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Paper • 1910.10683 • Published Oct 23, 2019 • 15

Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models

Paper • 2506.05176 • Published Jun 5 • 74
Advancing Multimodal Reasoning: From Optimized Cold Start to Staged Reinforcement Learning

Paper • 2506.04207 • Published Jun 4 • 48
MiMo-VL Technical Report

Paper • 2506.03569 • Published Jun 4 • 80
UniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation

Paper • 2506.03147 • Published Jun 3 • 58

Kimi-VL Technical Report

Paper • 2504.07491 • Published Apr 10 • 132

Vision Language Models: 2025 Update

This collection includes all the models, datasets and Spaces mentioned in the blog Vision Language Models: 2025 Update

Qwen/Qwen2.5-Omni-7B

Any-to-Any • 11B • Updated Apr 30 • 143k • 1.83k
Running

Featured

363

Qwen2.5 Omni 7B Demo

🏆

363

Generate text and speech from text, audio, images, and videos
Qwen2.5-Omni Technical Report

Paper • 2503.20215 • Published Mar 26 • 166
openbmb/MiniCPM-o-2_6

Any-to-Any • 9B • Updated Oct 5 • 100k • 1.27k

academic papers

One-Minute Video Generation with Test-Time Training

Paper • 2504.05298 • Published Apr 7 • 110
Slow-Fast Architecture for Video Multi-Modal Large Language Models

Paper • 2504.01328 • Published Apr 2 • 7
Kimi-VL Technical Report

Paper • 2504.07491 • Published Apr 10 • 132

Kimi-VL Technical Report

Paper • 2504.07491 • Published Apr 10 • 132

SmolVLM: Redefining small and efficient multimodal models

Paper • 2504.05299 • Published Apr 7 • 200
Kimi-VL Technical Report

Paper • 2504.07491 • Published Apr 10 • 132

Moonshot's efficient MoE VLMs, exceptional on agent, long-context, and thinking

Running on Zero

Featured

178

Chat with Kimi-VL-A3B-Thinking-2506

🤔

178

Chat with images, videos, or PDFs to generate text
moonshotai/Kimi-VL-A3B-Thinking-2506

Image-Text-to-Text • 16B • Updated Aug 18 • 151k • 325
moonshotai/Kimi-VL-A3B-Instruct

Image-Text-to-Text • 16B • Updated Jul 30 • 124k • 242
moonshotai/Kimi-VL-A3B-Thinking

Image-Text-to-Text • 16B • Updated Aug 18 • 42.3k • 442

Gemma 3 Technical Report

Paper • 2503.19786 • Published Mar 25 • 54
Kimi-VL Technical Report

Paper • 2504.07491 • Published Apr 10 • 132
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

Paper • 2504.10479 • Published Apr 14 • 303
FUSION: Fully Integration of Vision-Language Representations for Deep Cross-Modal Understanding

Paper • 2504.09925 • Published Apr 14 • 38

model-base-structure

Transformers without Normalization

Paper • 2503.10622 • Published Mar 13 • 171
Kimi-VL Technical Report

Paper • 2504.07491 • Published Apr 10 • 132

Previous
1
2
3
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs