Open LLM Leaderboard

Team

community

https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard

Activity Feed

AI & ML interests

Evaluating open LLMs

Recent Activity

AdinaY submitted a paper 4 days ago

TRACER: Trace-Based Adaptive Cost-Efficient Routing for LLM Classification

open-llm-bot updated a dataset 5 days ago

open-llm-leaderboard/requests

AdinaY submitted a paper 21 days ago

KAT-Coder-V2 Technical Report

View all activity

AdinaY

submitted a paper to Daily Papers 4 days ago

TRACER: Trace-Based Adaptive Cost-Efficient Routing for LLM Classification

Paper • 2604.14531 • Published 5 days ago • 6

open-llm-bot

updated a dataset 5 days ago

open-llm-leaderboard/requests

Preview • Updated 5 days ago • 54.2k • 12

victor

posted an update 7 days ago

Post

4750

Want to share my enthusiasm for zai-org/GLM-5.1 here too 🔥

I think we have it: our open source Claude Code = GLM-5.1 + Pi (https://pi.dev/) - Built a Three.js racing game to eval and it's extremely impressive. Thoughts:

- One-shot car physics with real drift mechanics (this is hard)

- My fav part: Awesome at self iterating (with no vision!) created 20+ Bun.WebView debugging tools to drive the car programmatically and read game state. Proved a winding bug with vector math without ever seeing the screen

- 531-line racing AI in a single write: 4 personalities, curvature map, racing lines, tactical drifting. Built telemetry tools to compare player vs AI speed curves and data-tuned parameters

- All assets from scratch: 3D models, procedural textures, sky shader, engine sounds, spatial AI audio!

- Can do hard math: proved road normals pointed DOWN via vector cross products, computed track curvature normalized by arc length to tune AI cornering speed

You are going to hear about this model a lot in the next months - open source let's go - and thanks z-ai🚀🚀

4 replies

AdinaY

submitted a paper to Daily Papers 21 days ago

KAT-Coder-V2 Technical Report

Paper • 2603.27703 • Published 23 days ago • 10

clefourrier

authored a paper about 1 month ago

Strategic Navigation or Stochastic Search? How Agents and Humans Reason Over Document Collections

Paper • 2603.12180 • Published Mar 12 • 65

AdinaY

submitted 2 papers to Daily Papers about 1 month ago

Training Language Models via Neural Cellular Automata

Paper • 2603.10055 • Published Mar 9 • 8

Planning in 8 Tokens: A Compact Discrete Tokenizer for Latent World Model

Paper • 2603.05438 • Published Mar 5 • 40

albertvillanova

posted an update about 2 months ago

Post

2403

🚀 TRL v0.29.0 introduces trl-training: an agent-native training skill.

This makes the TRL CLI a structured, agent-readable capability, allowing AI agents to reliably execute training workflows such as:
- Supervised Fine-Tuning (SFT)
- Direct Preference Optimization (DPO)
- Group Relative Policy Optimization (GRPO)

We’re excited to see what the community builds on top of this.

If you’re working on AI agents, alignment research, or scalable RL training infrastructure: give TRL v0.29.0 a try! 🤗

The future of ML tooling is agent-native.
🔗 https://github.com/huggingface/trl/releases/tag/v0.29.0

victor

submitted a paper to Daily Papers about 2 months ago

DualPath: Breaking the Storage Bandwidth Bottleneck in Agentic LLM Inference

Paper • 2602.21548 • Published Feb 25 • 50

clefourrier

in open-llm-leaderboard/results 2 months ago

Create README.md

#34 opened 3 months ago by

Highgroundbkk

lewtun

submitted a paper to Daily Papers 2 months ago

Single-minus gluon tree amplitudes are nonzero

Paper • 2602.12176 • Published Feb 12 • 8

AdinaY

posted an update 2 months ago

Post

3651

MiniMax M2.5 is now available on the hub 🚀

MiniMaxAI/MiniMax-M2.5

✨ 229B - Modified MIT license
✨37% faster than M2.1
✨ ~$1/hour at 100 TPS

2 replies

lewtun

submitted a paper to Daily Papers 2 months ago

Reasoning Cache: Continual Improvement Over Long Horizons via Short-Horizon RL

Paper • 2602.03773 • Published Feb 3 • 13

AdinaY

posted an update 2 months ago

Post

743

RynnBrain 🤖 a physics aware embodied brain for robots from Alibaba DAMO

https://huggingface.co/collections/Alibaba-DAMO-Academy/rynnbrain

✨ 2B/8B/30B (3B active)
✨ Apache 2.0
✨ Understands egocentric scenes with strong spatial awareness
✨ Tracks objects and motion over time

2 replies

AdinaY

posted an update 2 months ago

Post

4125

Game on 🎮🚀

While Seedance 2.0’s videos are all over the timeline, DeepSeek quietly pushed a new model update in its app.

GLM-5 from Z.ai adds more momentum.

Ming-flash-omni from Ant Group , MiniCPM-SALA from OpenBMB
, and the upcoming MiniMax M2.5 keep the heat on 🔥

Spring Festival is around the corner,
no one’s sleeping!

✨ More releases coming, stay tuned
https://huggingface.co/collections/zh-ai-community/2026-february-china-open-source-highlights

albertvillanova

posted an update 2 months ago

Post

1863

5 years already working in democratizing AI 🤗
Grateful to be part of such an awesome team making it happen every day.

AdinaY

posted an update 2 months ago

Post

3951

Ming-flash-omni 2.0 🚀 New open omni-MLLM released by Ant Group

inclusionAI/Ming-flash-omni-2.0

✨ MIT license
✨ MoE - 100B/6B active
✨ Zero-shot voice cloning + controllable audio
✨ Fine-grained visual knowledge grounding

2 replies

AdinaY

posted an update 2 months ago

Post

803

LLaDA 2.1 is out 🔥 A new series of MoE diffusion language model released by AntGroup

inclusionAI/LLaDA2.1-mini
inclusionAI/LLaDA2.1-flash

✨LLaDA2.1-mini: 16B - Apache2.0
✨LLaDA2.1-flash: 100B - Apache2.0
✨Both delivers editable generation, RL-trained diffusion reasoning and fast inference

2 replies

AdinaY

posted an update 3 months ago

Post

2620

AI for science is moving fast🚀

Intern-S1-Pro 🔬 a MoE multimodal scientific reasoning model from Shanghai AI Lab

internlm/Intern-S1-Pro

✨ 1T total / 22B active
✨ Apache 2.0
✨ SoTA scientific reasoning performance
✨ FoPE enables scalable modeling of long physical time series (10⁰–10⁶)

2 replies

AdinaY

posted an update 3 months ago

Post

1409

✨ China’s open source AI ecosystem has entered a new phase

https://huggingface.co/blog/huggingface/one-year-since-the-deepseek-moment-blog-3

One year after the “DeepSeek Moment,” open source has become the default. Models, research, infrastructure, and deployment are increasingly shared to support large-scale, system-level integration.

This final blog examines how leading Chinese AI organizations are evolving ,and what this implies for the future of open source.

AI & ML interests

Recent Activity

Team members 17

open-llm-leaderboard's activity

Create README.md