Which Pretraining Paradigm Better Serves Spatial Intelligence? An Empirical Comparison of Vision-Language and Video Generation Models Paper • 2605.28132 • Published 23 days ago • 25
Gamma-World: Generative Multi-Agent World Modeling Beyond Two Players Paper • 2605.28816 • Published 23 days ago • 425
UniT: Unified Geometry Learning with Group Autoregressive Transformer Paper • 2605.21131 • Published about 1 month ago • 8
CiteVQA: Benchmarking Evidence Attribution for Trustworthy Document Intelligence Paper • 2605.12882 • Published May 13 • 272
From Context to Skills: Can Language Models Learn from Context Skillfully? Paper • 2604.27660 • Published May 3 • 170
Squeez: Task-Conditioned Tool-Output Pruning for Coding Agents Paper • 2604.04979 • Published Apr 4 • 11
Salt: Self-Consistent Distribution Matching with Cache-Aware Training for Fast Video Generation Paper • 2604.03118 • Published Apr 3 • 6
HISA: Efficient Hierarchical Indexing for Fine-Grained Sparse Attention Paper • 2603.28458 • Published Mar 30 • 44