Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2508.10104

Thinking with Images for Multimodal Reasoning: Foundations, Methods, and Future Frontiers

Paper • 2506.23918 • Published Jun 30 • 89
LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale

Paper • 2504.16030 • Published Apr 22 • 36
Time Blindness: Why Video-Language Models Can't See What Humans Can?

Paper • 2505.24867 • Published May 30 • 80
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

Paper • 2507.01006 • Published Jul 1 • 240

Representation Learning

End-to-End Vision Tokenizer Tuning

Paper • 2505.10562 • Published May 15 • 22
Global and Local Entailment Learning for Natural World Imagery

Paper • 2506.21476 • Published Jun 26 • 1
DINOv3

Paper • 2508.10104 • Published Aug 13 • 285
Reasoning Vectors: Transferring Chain-of-Thought Capabilities via Task Arithmetic

Paper • 2509.01363 • Published Sep 1 • 58

Interesting Papers

ReZero: Enhancing LLM search ability by trying one-more-time

Paper • 2504.11001 • Published Apr 15 • 16
FonTS: Text Rendering with Typography and Style Controls

Paper • 2412.00136 • Published Nov 28, 2024 • 1
GenEx: Generating an Explorable World

Paper • 2412.09624 • Published Dec 12, 2024 • 97
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference

Paper • 2412.13663 • Published Dec 18, 2024 • 158

interesting papers

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

Paper • 2502.02737 • Published Feb 4 • 249
A Survey of Context Engineering for Large Language Models

Paper • 2507.13334 • Published Jul 17 • 259
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Paper • 2501.12948 • Published Jan 22 • 429
MiniMax-01: Scaling Foundation Models with Lightning Attention

Paper • 2501.08313 • Published Jan 14 • 301

interesting architecture

FAN: Fourier Analysis Networks

Paper • 2410.02675 • Published Oct 3, 2024 • 28
Tensor Product Attention Is All You Need

Paper • 2501.06425 • Published Jan 11 • 90
Scalable-Softmax Is Superior for Attention

Paper • 2501.19399 • Published Jan 31 • 22
EQ-VAE: Equivariance Regularized Latent Space for Improved Generative Image Modeling

Paper • 2502.09509 • Published Feb 13 • 8

Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models

Paper • 2506.05176 • Published Jun 5 • 74
Advancing Multimodal Reasoning: From Optimized Cold Start to Staged Reinforcement Learning

Paper • 2506.04207 • Published Jun 4 • 48
MiMo-VL Technical Report

Paper • 2506.03569 • Published Jun 4 • 80
UniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation

Paper • 2506.03147 • Published Jun 3 • 58

microsoft/bitnet-b1.58-2B-4T

Text Generation • 0.8B • Updated May 1 • 7.7k • 1.22k
M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models

Paper • 2504.10449 • Published Apr 14 • 15
nvidia/Llama-3.1-Nemotron-8B-UltraLong-2M-Instruct

Text Generation • 8B • Updated Apr 17 • 140 • 15
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs

Paper • 2504.11536 • Published Apr 15 • 63

Research Papers

Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters

Paper • 2408.03314 • Published Aug 6, 2024 • 63
TAG: A Decentralized Framework for Multi-Agent Hierarchical Reinforcement Learning

Paper • 2502.15425 • Published Feb 21 • 9
EgoLife: Towards Egocentric Life Assistant

Paper • 2503.03803 • Published Mar 5 • 46
Visual-RFT: Visual Reinforcement Fine-Tuning

Paper • 2503.01785 • Published Mar 3 • 85

Research Papers/Reviews/Literature

Daily Research papers and review including older relevant content.

Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs

Paper • 2501.18585 • Published Jan 30 • 61
RWKV-7 "Goose" with Expressive Dynamic State Evolution

Paper • 2503.14456 • Published Mar 18 • 153
DeepMesh: Auto-Regressive Artist-mesh Creation with Reinforcement Learning

Paper • 2503.15265 • Published Mar 19 • 46
Cosmos-Reason1: From Physical Common Sense To Embodied Reasoning

Paper • 2503.15558 • Published Mar 18 • 50

What matters when building vision-language models?

Paper • 2405.02246 • Published May 3, 2024 • 103
An Introduction to Vision-Language Modeling

Paper • 2405.17247 • Published May 27, 2024 • 90
DeMamba: AI-Generated Video Detection on Million-Scale GenVideo Benchmark

Paper • 2405.19707 • Published May 30, 2024 • 8
Scaling Up Your Kernels: Large Kernel Design in ConvNets towards Universal Representations

Paper • 2410.08049 • Published Oct 10, 2024 • 8

Thinking with Images for Multimodal Reasoning: Foundations, Methods, and Future Frontiers

Paper • 2506.23918 • Published Jun 30 • 89
LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale

Paper • 2504.16030 • Published Apr 22 • 36
Time Blindness: Why Video-Language Models Can't See What Humans Can?

Paper • 2505.24867 • Published May 30 • 80
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

Paper • 2507.01006 • Published Jul 1 • 240

Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models

Paper • 2506.05176 • Published Jun 5 • 74
Advancing Multimodal Reasoning: From Optimized Cold Start to Staged Reinforcement Learning

Paper • 2506.04207 • Published Jun 4 • 48
MiMo-VL Technical Report

Paper • 2506.03569 • Published Jun 4 • 80
UniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation

Paper • 2506.03147 • Published Jun 3 • 58

Representation Learning

End-to-End Vision Tokenizer Tuning

Paper • 2505.10562 • Published May 15 • 22
Global and Local Entailment Learning for Natural World Imagery

Paper • 2506.21476 • Published Jun 26 • 1
DINOv3

Paper • 2508.10104 • Published Aug 13 • 285
Reasoning Vectors: Transferring Chain-of-Thought Capabilities via Task Arithmetic

Paper • 2509.01363 • Published Sep 1 • 58

microsoft/bitnet-b1.58-2B-4T

Text Generation • 0.8B • Updated May 1 • 7.7k • 1.22k
M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models

Paper • 2504.10449 • Published Apr 14 • 15
nvidia/Llama-3.1-Nemotron-8B-UltraLong-2M-Instruct

Text Generation • 8B • Updated Apr 17 • 140 • 15
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs

Paper • 2504.11536 • Published Apr 15 • 63

Interesting Papers

ReZero: Enhancing LLM search ability by trying one-more-time

Paper • 2504.11001 • Published Apr 15 • 16
FonTS: Text Rendering with Typography and Style Controls

Paper • 2412.00136 • Published Nov 28, 2024 • 1
GenEx: Generating an Explorable World

Paper • 2412.09624 • Published Dec 12, 2024 • 97
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference

Paper • 2412.13663 • Published Dec 18, 2024 • 158

Research Papers

Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters

Paper • 2408.03314 • Published Aug 6, 2024 • 63
TAG: A Decentralized Framework for Multi-Agent Hierarchical Reinforcement Learning

Paper • 2502.15425 • Published Feb 21 • 9
EgoLife: Towards Egocentric Life Assistant

Paper • 2503.03803 • Published Mar 5 • 46
Visual-RFT: Visual Reinforcement Fine-Tuning

Paper • 2503.01785 • Published Mar 3 • 85

interesting papers

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

Paper • 2502.02737 • Published Feb 4 • 249
A Survey of Context Engineering for Large Language Models

Paper • 2507.13334 • Published Jul 17 • 259
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Paper • 2501.12948 • Published Jan 22 • 429
MiniMax-01: Scaling Foundation Models with Lightning Attention

Paper • 2501.08313 • Published Jan 14 • 301

Research Papers/Reviews/Literature

Daily Research papers and review including older relevant content.

Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs

Paper • 2501.18585 • Published Jan 30 • 61
RWKV-7 "Goose" with Expressive Dynamic State Evolution

Paper • 2503.14456 • Published Mar 18 • 153
DeepMesh: Auto-Regressive Artist-mesh Creation with Reinforcement Learning

Paper • 2503.15265 • Published Mar 19 • 46
Cosmos-Reason1: From Physical Common Sense To Embodied Reasoning

Paper • 2503.15558 • Published Mar 18 • 50

interesting architecture

FAN: Fourier Analysis Networks

Paper • 2410.02675 • Published Oct 3, 2024 • 28
Tensor Product Attention Is All You Need

Paper • 2501.06425 • Published Jan 11 • 90
Scalable-Softmax Is Superior for Attention

Paper • 2501.19399 • Published Jan 31 • 22
EQ-VAE: Equivariance Regularized Latent Space for Improved Generative Image Modeling

Paper • 2502.09509 • Published Feb 13 • 8

What matters when building vision-language models?

Paper • 2405.02246 • Published May 3, 2024 • 103
An Introduction to Vision-Language Modeling

Paper • 2405.17247 • Published May 27, 2024 • 90
DeMamba: AI-Generated Video Detection on Million-Scale GenVideo Benchmark

Paper • 2405.19707 • Published May 30, 2024 • 8
Scaling Up Your Kernels: Large Kernel Design in ConvNets towards Universal Representations

Paper • 2410.08049 • Published Oct 10, 2024 • 8

Previous
1
2
3
4
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs