Running 85 Scaling FineWeb to 1000+ languages: Step 1: finding signal in 100s of evaluation tasks 📝 85 Evaluate multilingual models using FineTasks
FrenchBench Evaluation datasets Collection These datasets are used to evaluate models on French performance using: https://github.com/EleutherAI/lm-evaluation-harness (from CroissantLLM paper) • 11 items • Updated Jun 7, 2024 • 7
Running on CPU Upgrade 13.7k Open LLM Leaderboard 🏆 13.7k Track, rank and evaluate open LLMs and chatbots
Running on CPU Upgrade 103 Open LLM Leaderboard 🏆 103 Track, rank and evaluate open LLMs and chatbots