Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2511.00279

💡HF Papers Live 5: Omni-Modal models

meituan-longcat/LongCat-Flash-Omni

Any-to-Any • 561B • Updated 26 days ago • 189 • 100
LongCat-Flash-Omni Technical Report

Paper • 2511.00279 • Published Oct 31 • 22
OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM

Paper • 2510.15870 • Published Oct 17 • 89
nvidia/omnivinci

Feature Extraction • Updated Oct 29 • 6.83k • 163

Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis

Paper • 2412.15322 • Published Dec 19, 2024 • 20
Voila: Voice-Language Foundation Models for Real-Time Autonomous Interaction and Voice Role-Play

Paper • 2505.02707 • Published May 5 • 85
LLaMA-Omni2: LLM-based Real-time Spoken Chatbot with Autoregressive Streaming Speech Synthesis

Paper • 2505.02625 • Published May 5 • 22
Fast Text-to-Audio Generation with Adversarial Post-Training

Paper • 2505.08175 • Published May 13 • 25

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 29
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 14
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 23

Describe What You See with Multimodal Large Language Models to Enhance Video Recommendations

Paper • 2508.09789 • Published Aug 13 • 5
MM-BrowseComp: A Comprehensive Benchmark for Multimodal Browsing Agents

Paper • 2508.13186 • Published Aug 14 • 18
ZARA: Zero-shot Motion Time-Series Analysis via Knowledge and Retrieval Driven LLM Agents

Paper • 2508.04038 • Published Aug 6 • 1
Prompt Orchestration Markup Language

Paper • 2508.13948 • Published Aug 19 • 48

iVideoGPT: Interactive VideoGPTs are Scalable World Models

Paper • 2405.15223 • Published May 24, 2024 • 17
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models

Paper • 2405.15574 • Published May 24, 2024 • 55
An Introduction to Vision-Language Modeling

Paper • 2405.17247 • Published May 27, 2024 • 90
Matryoshka Multimodal Models

Paper • 2405.17430 • Published May 27, 2024 • 34

💡HF Papers Live 5: Omni-Modal models

meituan-longcat/LongCat-Flash-Omni

Any-to-Any • 561B • Updated 26 days ago • 189 • 100
LongCat-Flash-Omni Technical Report

Paper • 2511.00279 • Published Oct 31 • 22
OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM

Paper • 2510.15870 • Published Oct 17 • 89
nvidia/omnivinci

Feature Extraction • Updated Oct 29 • 6.83k • 163

Describe What You See with Multimodal Large Language Models to Enhance Video Recommendations

Paper • 2508.09789 • Published Aug 13 • 5
MM-BrowseComp: A Comprehensive Benchmark for Multimodal Browsing Agents

Paper • 2508.13186 • Published Aug 14 • 18
ZARA: Zero-shot Motion Time-Series Analysis via Knowledge and Retrieval Driven LLM Agents

Paper • 2508.04038 • Published Aug 6 • 1
Prompt Orchestration Markup Language

Paper • 2508.13948 • Published Aug 19 • 48

Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis

Paper • 2412.15322 • Published Dec 19, 2024 • 20
Voila: Voice-Language Foundation Models for Real-Time Autonomous Interaction and Voice Role-Play

Paper • 2505.02707 • Published May 5 • 85
LLaMA-Omni2: LLM-based Real-time Spoken Chatbot with Autoregressive Streaming Speech Synthesis

Paper • 2505.02625 • Published May 5 • 22
Fast Text-to-Audio Generation with Adversarial Post-Training

Paper • 2505.08175 • Published May 13 • 25

iVideoGPT: Interactive VideoGPTs are Scalable World Models

Paper • 2405.15223 • Published May 24, 2024 • 17
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models

Paper • 2405.15574 • Published May 24, 2024 • 55
An Introduction to Vision-Language Modeling

Paper • 2405.17247 • Published May 27, 2024 • 90
Matryoshka Multimodal Models

Paper • 2405.17430 • Published May 27, 2024 • 34

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 29
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 14
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 23

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs