Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Log In
Sign Up
Croc-Prog-HF
's Collections
Synthetic Data Generation & Datasets
Deepfake & AI content detection
Bias, Misalignment, and AI Safety
Benchmark datasets
LoreWeaver-2 Family
MultiLang-Texts HQ Datasets
Math-HQ-datasets
Benchmark datasets
updated
5 days ago
Upvote
1
cais/hle
Benchmark
•
Updated
Jan 20
•
2.5k
•
44.7k
•
757
Qwen/DeepPlanning
Viewer
•
Updated
Mar 3
•
2.14k
•
706
•
193
gaia-benchmark/GAIA
Viewer
•
Updated
Oct 28, 2025
•
932
•
33.8k
•
634
BLINK-Benchmark/BLINK
Viewer
•
Updated
Sep 3, 2025
•
3.81k
•
15.3k
•
41
openai/gsm8k
Benchmark
•
Updated
12 days ago
•
17.6k
•
758k
•
1.23k
allenai/olmOCR-bench
Benchmark
•
Updated
Feb 19
•
3.44k
•
168
TIGER-Lab/MMLU-Pro
Benchmark
•
Updated
24 days ago
•
12.1k
•
125k
•
461
openai/openai_humaneval
Viewer
•
Updated
Jan 4, 2024
•
164
•
244k
•
376
Muennighoff/mbpp
Viewer
•
Updated
Oct 20, 2022
•
1.4k
•
1.9k
•
22
bigcode/bigcodebench
Viewer
•
Updated
Apr 30, 2025
•
5.7k
•
40.8k
•
77
livecodebench/test_generation
Viewer
•
Updated
Jun 13, 2024
•
442
•
1.17k
•
7
ScaleAI/SWE-bench_Pro
Benchmark
•
Updated
Feb 23
•
731
•
834k
•
70
SWE-bench/SWE-bench_Verified
Benchmark
•
Updated
Feb 27
•
500
•
120k
•
24
mteb/arguana
Benchmark
•
Updated
Feb 22
•
11.5k
•
11.8k
•
5
MathArena/hmmt_feb_2026
Benchmark
•
Updated
Feb 19
•
33
•
1.53k
•
1
Idavidrein/gpqa
Benchmark
•
Updated
30 days ago
•
1.25k
•
111k
•
407
likaixin/ScreenSpot-Pro
Benchmark
•
Updated
17 days ago
•
5.94k
•
60
harborframework/terminal-bench-2.0
Benchmark
•
Updated
Feb 17
•
2.47k
•
18
FutureMa/EvasionBench
Benchmark
•
Updated
Feb 19
•
16.7k
•
167
•
84
internlm/WildClawBench
Updated
3 days ago
•
8.59k
•
46
FINAL-Bench/World-Model
Viewer
•
Updated
6 days ago
•
100
•
1.25k
•
25
Upvote
1
Share collection
View history
Collection guide
Browse collections