pretrain
updated
HuggingFaceTB/smollm-corpus
Viewer
•
Updated
•
237M
•
14.6k
•
408
opencsg/chinese-fineweb-edu
Viewer
•
Updated
•
84.6M
•
16.2k
•
109
wenge-research/yayi2_pretrain_data
Viewer
•
Updated
•
1.68M
•
2.89k
•
57
opencsg/chinese-cosmopedia
Preview
•
Updated
•
1.62k
•
74
Viewer
•
Updated
•
31.1M
•
47.5k
•
649
Infi-MM/InfiMM-WebMath-40B
Viewer
•
Updated
•
22.8M
•
1.14k
•
68
Viewer
•
Updated
•
63.1M
•
1.42k
•
26
gair-prox/open-web-math-pro
Viewer
•
Updated
•
2.58M
•
708
•
12
argilla/FinePersonas-v0.1
Viewer
•
Updated
•
42.1M
•
10.1k
•
408
Updated
•
45k
•
247
Preview
•
Updated
•
175k
•
85
opencsg/chinese-fineweb-edu-v2
Viewer
•
Updated
•
188M
•
10.7k
•
72
OpenCoder-LLM/opc-fineweb-code-corpus
Viewer
•
Updated
•
101M
•
9.83k
•
50
OpenCoder-LLM/opc-fineweb-math-corpus
Viewer
•
Updated
•
5.24M
•
1.98k
•
30
Viewer
•
Updated
•
470M
•
44.5k
•
321
CASIA-LM/ChineseWebText2.0
Viewer
•
Updated
•
2k
•
2.08k
•
27
Viewer
•
Updated
•
4.48B
•
60k
•
709
Viewer
•
Updated
•
48.3M
•
11.3k
•
344
togethercomputer/RedPajama-Data-V2
Updated
•
4.76k
•
389
Viewer
•
Updated
•
217M
•
37.6k
•
108
BramVanroy/CommonCrawl-CreativeCommons
Viewer
•
Updated
•
739M
•
786
•
34
Viewer
•
Updated
•
1.29B
•
59.5k
•
286
Viewer
•
Updated
•
157M
•
5.97k
•
53
ByteDance-Seed/mga-fineweb-edu
Viewer
•
Updated
•
846M
•
2.58k
•
34
Viewer
•
Updated
•
2.55M
•
16.8k
•
159