Reasoning Cache: Continual Improvement Over Long Horizons via Short-Horizon RL Paper β’ 2602.03773 β’ Published Feb 3 β’ 12
Kimina-Prover Preview: Towards Large Formal Reasoning Models with Reinforcement Learning Paper β’ 2504.11354 β’ Published Apr 15, 2025 β’ 6
SmolVLM: Redefining small and efficient multimodal models Paper β’ 2504.05299 β’ Published Apr 7, 2025 β’ 207
Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning Paper β’ 2503.07572 β’ Published Mar 10, 2025 β’ 48
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model Paper β’ 2502.02737 β’ Published Feb 4, 2025 β’ 257
RAFT: A Real-World Few-Shot Text Classification Benchmark Paper β’ 2109.14076 β’ Published Sep 28, 2021 β’ 2
GEMv2: Multilingual NLG Benchmarking in a Single Line of Code Paper β’ 2206.11249 β’ Published Jun 22, 2022
AfroDigits: A Community-Driven Spoken Digit Dataset for African Languages Paper β’ 2303.12582 β’ Published Mar 22, 2023 β’ 23
Evaluate & Evaluation on the Hub: Better Best Practices for Data and Model Measurements Paper β’ 2210.01970 β’ Published Sep 30, 2022 β’ 13