Union of Experts: Adapting Hierarchical Routing to Equivalently Decomposed Transformer Paper • 2503.02495 • Published Mar 4 • 9
Nemotron-Flash: Towards Latency-Optimal Hybrid Small Language Models Paper • 2511.18890 • Published 16 days ago • 29
World in a Frame: Understanding Culture Mixing as a New Challenge for Vision-Language Models Paper • 2511.22787 • Published 13 days ago • 8