Can Visual Input Be Compressed? A Visual Token Compression Benchmark for Large Multimodal Models Paper • 2511.02650 • Published Nov 4 • 9
DaMo: Data Mixing Optimizer in Fine-tuning Multimodal LLMs for Mobile Phone Agents Paper • 2510.19336 • Published Oct 22 • 16
AndesVL Collection AndesVL is a suite of mobile-optimized Multimodal Large Language Models (MLLMs) with 0.6B to 4B parameters. • 8 items • Updated Oct 15 • 11
AndesVL Technical Report: An Efficient Mobile-side Multimodal Large Language Model Paper • 2510.11496 • Published Oct 13 • 3
view article Article OpenReasoning-Nemotron: A Family of State-of-the-Art Distilled Reasoning Models Jul 18 • 50
SmolLM3 pretraining datasets Collection datasets used in SmolLM3 pretraining • 15 items • Updated Aug 12 • 39
FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language Paper • 2506.20920 • Published Jun 26 • 75
GenRecal: Generation after Recalibration from Large to Small Vision-Language Models Paper • 2506.15681 • Published Jun 18 • 39
MathCoder-VL: Bridging Vision and Code for Enhanced Multimodal Mathematical Reasoning Paper • 2505.10557 • Published May 15 • 47
GenX: Mastering Code and Test Generation with Execution Feedback Paper • 2412.13464 • Published Dec 18, 2024 • 1
Improved Visual-Spatial Reasoning via R1-Zero-Like Training Paper • 2504.00883 • Published Apr 1 • 66
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning Paper • 2501.12948 • Published Jan 22 • 429
PixMo Collection A set of vision-language datasets built by Ai2 and used to train the Molmo family of models. Read more at https://molmo.allenai.org/blog • 10 items • Updated 7 days ago • 81