-
Can Large Language Models Understand Context?
Paper • 2402.00858 • Published • 24 -
OLMo: Accelerating the Science of Language Models
Paper • 2402.00838 • Published • 85 -
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 152 -
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity
Paper • 2401.17072 • Published • 25
Collections
Discover the best community collections!
Collections including paper arxiv:2511.16624
-
SAM 3D: 3Dfy Anything in Images
Paper • 2511.16624 • Published • 113 -
Orient Anything V2: Unifying Orientation and Rotation Understanding
Paper • 2601.05573 • Published • 9 -
A generalizable 3D framework and model for self-supervised learning in medical imaging
Paper • 2501.11755 • Published -
SA-Med2D-20M Dataset: Segment Anything in 2D Medical Imaging with 20 Million masks
Paper • 2311.11969 • Published • 4
-
SAM 3: Segment Anything with Concepts
Paper • 2511.16719 • Published • 130 -
Wan-Move: Motion-controllable Video Generation via Latent Trajectory Guidance
Paper • 2512.08765 • Published • 132 -
Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length
Paper • 2512.04677 • Published • 171 -
LongCat-Image Technical Report
Paper • 2512.07584 • Published • 23
-
FastVLM: Efficient Vision Encoding for Vision Language Models
Paper • 2412.13303 • Published • 75 -
rStar2-Agent: Agentic Reasoning Technical Report
Paper • 2508.20722 • Published • 117 -
AgentScope 1.0: A Developer-Centric Framework for Building Agentic Applications
Paper • 2508.16279 • Published • 53 -
OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling
Paper • 2509.12201 • Published • 106
-
Arbitrary-steps Image Super-resolution via Diffusion Inversion
Paper • 2412.09013 • Published • 13 -
Deep Researcher with Test-Time Diffusion
Paper • 2507.16075 • Published • 68 -
nablaNABLA: Neighborhood Adaptive Block-Level Attention
Paper • 2507.13546 • Published • 125 -
Yume: An Interactive World Generation Model
Paper • 2507.17744 • Published • 91
-
Describe What You See with Multimodal Large Language Models to Enhance Video Recommendations
Paper • 2508.09789 • Published • 5 -
MM-BrowseComp: A Comprehensive Benchmark for Multimodal Browsing Agents
Paper • 2508.13186 • Published • 19 -
ZARA: Zero-shot Motion Time-Series Analysis via Knowledge and Retrieval Driven LLM Agents
Paper • 2508.04038 • Published • 1 -
Prompt Orchestration Markup Language
Paper • 2508.13948 • Published • 48
-
Neural-Driven Image Editing
Paper • 2507.05397 • Published • 27 -
π^3: Scalable Permutation-Equivariant Visual Geometry Learning
Paper • 2507.13347 • Published • 66 -
MoVieS: Motion-Aware 4D Dynamic View Synthesis in One Second
Paper • 2507.10065 • Published • 25 -
From One to More: Contextual Part Latents for 3D Generation
Paper • 2507.08772 • Published • 26
-
Can Large Language Models Understand Context?
Paper • 2402.00858 • Published • 24 -
OLMo: Accelerating the Science of Language Models
Paper • 2402.00838 • Published • 85 -
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 152 -
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity
Paper • 2401.17072 • Published • 25
-
SAM 3D: 3Dfy Anything in Images
Paper • 2511.16624 • Published • 113 -
Orient Anything V2: Unifying Orientation and Rotation Understanding
Paper • 2601.05573 • Published • 9 -
A generalizable 3D framework and model for self-supervised learning in medical imaging
Paper • 2501.11755 • Published -
SA-Med2D-20M Dataset: Segment Anything in 2D Medical Imaging with 20 Million masks
Paper • 2311.11969 • Published • 4
-
SAM 3: Segment Anything with Concepts
Paper • 2511.16719 • Published • 130 -
Wan-Move: Motion-controllable Video Generation via Latent Trajectory Guidance
Paper • 2512.08765 • Published • 132 -
Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length
Paper • 2512.04677 • Published • 171 -
LongCat-Image Technical Report
Paper • 2512.07584 • Published • 23
-
FastVLM: Efficient Vision Encoding for Vision Language Models
Paper • 2412.13303 • Published • 75 -
rStar2-Agent: Agentic Reasoning Technical Report
Paper • 2508.20722 • Published • 117 -
AgentScope 1.0: A Developer-Centric Framework for Building Agentic Applications
Paper • 2508.16279 • Published • 53 -
OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling
Paper • 2509.12201 • Published • 106
-
Describe What You See with Multimodal Large Language Models to Enhance Video Recommendations
Paper • 2508.09789 • Published • 5 -
MM-BrowseComp: A Comprehensive Benchmark for Multimodal Browsing Agents
Paper • 2508.13186 • Published • 19 -
ZARA: Zero-shot Motion Time-Series Analysis via Knowledge and Retrieval Driven LLM Agents
Paper • 2508.04038 • Published • 1 -
Prompt Orchestration Markup Language
Paper • 2508.13948 • Published • 48
-
Arbitrary-steps Image Super-resolution via Diffusion Inversion
Paper • 2412.09013 • Published • 13 -
Deep Researcher with Test-Time Diffusion
Paper • 2507.16075 • Published • 68 -
nablaNABLA: Neighborhood Adaptive Block-Level Attention
Paper • 2507.13546 • Published • 125 -
Yume: An Interactive World Generation Model
Paper • 2507.17744 • Published • 91
-
Neural-Driven Image Editing
Paper • 2507.05397 • Published • 27 -
π^3: Scalable Permutation-Equivariant Visual Geometry Learning
Paper • 2507.13347 • Published • 66 -
MoVieS: Motion-Aware 4D Dynamic View Synthesis in One Second
Paper • 2507.10065 • Published • 25 -
From One to More: Contextual Part Latents for 3D Generation
Paper • 2507.08772 • Published • 26