Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2511.16624

AI Paper of the Day

A collection of papers that I think are interesting, one added each day

about 3 hours ago

Can Large Language Models Understand Context?

Paper • 2402.00858 • Published Feb 1, 2024 • 24
OLMo: Accelerating the Science of Language Models

Paper • 2402.00838 • Published Feb 1, 2024 • 85
Self-Rewarding Language Models

Paper • 2401.10020 • Published Jan 18, 2024 • 152
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity

Paper • 2401.17072 • Published Jan 30, 2024 • 25

SAM 3D: 3Dfy Anything in Images

Paper • 2511.16624 • Published Nov 20, 2025 • 113
Orient Anything V2: Unifying Orientation and Rotation Understanding

Paper • 2601.05573 • Published Jan 9 • 9
A generalizable 3D framework and model for self-supervised learning in medical imaging

Paper • 2501.11755 • Published Jan 20, 2025
SA-Med2D-20M Dataset: Segment Anything in 2D Medical Imaging with 20 Million masks

Paper • 2311.11969 • Published Nov 20, 2023 • 4

Project Ideas 2026

SAM 3: Segment Anything with Concepts

Paper • 2511.16719 • Published Nov 20, 2025 • 130
Wan-Move: Motion-controllable Video Generation via Latent Trajectory Guidance

Paper • 2512.08765 • Published Dec 9, 2025 • 132
Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length

Paper • 2512.04677 • Published Dec 4, 2025 • 171
LongCat-Image Technical Report

Paper • 2512.07584 • Published Dec 8, 2025 • 23

FastVLM: Efficient Vision Encoding for Vision Language Models

Paper • 2412.13303 • Published Dec 17, 2024 • 75
rStar2-Agent: Agentic Reasoning Technical Report

Paper • 2508.20722 • Published Aug 28, 2025 • 117
AgentScope 1.0: A Developer-Centric Framework for Building Agentic Applications

Paper • 2508.16279 • Published Aug 22, 2025 • 53
OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling

Paper • 2509.12201 • Published Sep 15, 2025 • 106

Arbitrary-steps Image Super-resolution via Diffusion Inversion

Paper • 2412.09013 • Published Dec 12, 2024 • 13
Deep Researcher with Test-Time Diffusion

Paper • 2507.16075 • Published Jul 21, 2025 • 68
nablaNABLA: Neighborhood Adaptive Block-Level Attention

Paper • 2507.13546 • Published Jul 17, 2025 • 125
Yume: An Interactive World Generation Model

Paper • 2507.17744 • Published Jul 23, 2025 • 91

mac text to pic

SAM 3D: 3Dfy Anything in Images

Paper • 2511.16624 • Published Nov 20, 2025 • 113
Running on CPU Upgrade

1.07k

Omni Image Editor

🖼

1.07k

Image edit, text to image, image upscale, remove watermark

facebook/locate-3d

Updated Apr 17, 2025 • 173 • 11
facebook/locate-3d-plus

Updated Apr 17, 2025 • 37 • 8
facebook/3d-jepa

Updated Apr 17, 2025 • 34 • 6
Masking Teacher and Reinforcing Student for Distilling Vision-Language Models

Paper • 2512.22238 • Published Dec 23, 2025 • 30

SAM 3D: 3Dfy Anything in Images

Paper • 2511.16624 • Published Nov 20, 2025 • 113

Describe What You See with Multimodal Large Language Models to Enhance Video Recommendations

Paper • 2508.09789 • Published Aug 13, 2025 • 5
MM-BrowseComp: A Comprehensive Benchmark for Multimodal Browsing Agents

Paper • 2508.13186 • Published Aug 14, 2025 • 19
ZARA: Zero-shot Motion Time-Series Analysis via Knowledge and Retrieval Driven LLM Agents

Paper • 2508.04038 • Published Aug 6, 2025 • 1
Prompt Orchestration Markup Language

Paper • 2508.13948 • Published Aug 19, 2025 • 48

Neural-Driven Image Editing

Paper • 2507.05397 • Published Jul 7, 2025 • 27
π^3: Scalable Permutation-Equivariant Visual Geometry Learning

Paper • 2507.13347 • Published Jul 17, 2025 • 66
MoVieS: Motion-Aware 4D Dynamic View Synthesis in One Second

Paper • 2507.10065 • Published Jul 14, 2025 • 25
From One to More: Contextual Part Latents for 3D Generation

Paper • 2507.08772 • Published Jul 11, 2025 • 26

AI Paper of the Day

A collection of papers that I think are interesting, one added each day

about 3 hours ago

Can Large Language Models Understand Context?

Paper • 2402.00858 • Published Feb 1, 2024 • 24
OLMo: Accelerating the Science of Language Models

Paper • 2402.00838 • Published Feb 1, 2024 • 85
Self-Rewarding Language Models

Paper • 2401.10020 • Published Jan 18, 2024 • 152
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity

Paper • 2401.17072 • Published Jan 30, 2024 • 25

mac text to pic

SAM 3D: 3Dfy Anything in Images

Paper • 2511.16624 • Published Nov 20, 2025 • 113
Running on CPU Upgrade

1.07k

Omni Image Editor

🖼

1.07k

Image edit, text to image, image upscale, remove watermark

SAM 3D: 3Dfy Anything in Images

Paper • 2511.16624 • Published Nov 20, 2025 • 113
Orient Anything V2: Unifying Orientation and Rotation Understanding

Paper • 2601.05573 • Published Jan 9 • 9
A generalizable 3D framework and model for self-supervised learning in medical imaging

Paper • 2501.11755 • Published Jan 20, 2025
SA-Med2D-20M Dataset: Segment Anything in 2D Medical Imaging with 20 Million masks

Paper • 2311.11969 • Published Nov 20, 2023 • 4

facebook/locate-3d

Updated Apr 17, 2025 • 173 • 11
facebook/locate-3d-plus

Updated Apr 17, 2025 • 37 • 8
facebook/3d-jepa

Updated Apr 17, 2025 • 34 • 6
Masking Teacher and Reinforcing Student for Distilling Vision-Language Models

Paper • 2512.22238 • Published Dec 23, 2025 • 30

Project Ideas 2026

SAM 3: Segment Anything with Concepts

Paper • 2511.16719 • Published Nov 20, 2025 • 130
Wan-Move: Motion-controllable Video Generation via Latent Trajectory Guidance

Paper • 2512.08765 • Published Dec 9, 2025 • 132
Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length

Paper • 2512.04677 • Published Dec 4, 2025 • 171
LongCat-Image Technical Report

Paper • 2512.07584 • Published Dec 8, 2025 • 23

SAM 3D: 3Dfy Anything in Images

Paper • 2511.16624 • Published Nov 20, 2025 • 113

FastVLM: Efficient Vision Encoding for Vision Language Models

Paper • 2412.13303 • Published Dec 17, 2024 • 75
rStar2-Agent: Agentic Reasoning Technical Report

Paper • 2508.20722 • Published Aug 28, 2025 • 117
AgentScope 1.0: A Developer-Centric Framework for Building Agentic Applications

Paper • 2508.16279 • Published Aug 22, 2025 • 53
OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling

Paper • 2509.12201 • Published Sep 15, 2025 • 106

Describe What You See with Multimodal Large Language Models to Enhance Video Recommendations

Paper • 2508.09789 • Published Aug 13, 2025 • 5
MM-BrowseComp: A Comprehensive Benchmark for Multimodal Browsing Agents

Paper • 2508.13186 • Published Aug 14, 2025 • 19
ZARA: Zero-shot Motion Time-Series Analysis via Knowledge and Retrieval Driven LLM Agents

Paper • 2508.04038 • Published Aug 6, 2025 • 1
Prompt Orchestration Markup Language

Paper • 2508.13948 • Published Aug 19, 2025 • 48

Arbitrary-steps Image Super-resolution via Diffusion Inversion

Paper • 2412.09013 • Published Dec 12, 2024 • 13
Deep Researcher with Test-Time Diffusion

Paper • 2507.16075 • Published Jul 21, 2025 • 68
nablaNABLA: Neighborhood Adaptive Block-Level Attention

Paper • 2507.13546 • Published Jul 17, 2025 • 125
Yume: An Interactive World Generation Model

Paper • 2507.17744 • Published Jul 23, 2025 • 91

Neural-Driven Image Editing

Paper • 2507.05397 • Published Jul 7, 2025 • 27
π^3: Scalable Permutation-Equivariant Visual Geometry Learning

Paper • 2507.13347 • Published Jul 17, 2025 • 66
MoVieS: Motion-Aware 4D Dynamic View Synthesis in One Second

Paper • 2507.10065 • Published Jul 14, 2025 • 25
From One to More: Contextual Part Latents for 3D Generation

Paper • 2507.08772 • Published Jul 11, 2025 • 26

Previous
1
2
Next

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs