opendatalab
/

meta-rater-7b-25raters

@@ -1,15 +1,23 @@
 ---
-license: mit
 datasets:
 - cerebras/SlimPajama-627B
 language:
 - en
-base_model:
-- internlm/internlm2-7b
 ---
 # Meta-rater Language Model (7.2B Parameters, 150B Tokens)
 ## Model Description
 This is a 7.2B parameter transformer-based decoder-only language model trained from scratch on 150B tokens selected from SlimPajama dataset using the **Meta-rater** framework with all 25 quality scores. This represents the largest and most capable model in the Meta-rater research, demonstrating maximal benefits of quality-driven data selection at scale.
@@ -224,4 +232,4 @@ Please refer to the license terms of the original SlimPajama dataset and follow
 ## Contact
-For questions or issues, please contact the authors or open an issue in the repository.

 ---
+base_model:
+- internlm/internlm2-7b
 datasets:
 - cerebras/SlimPajama-627B
 language:
 - en
+license: mit
+metrics:
+- accuracy
+pipeline_tag: text-generation
+library_name: transformers
 ---
 # Meta-rater Language Model (7.2B Parameters, 150B Tokens)
+This repository contains the model described in the paper [Meta-rater: A Multi-dimensional Data Selection Method for Pre-training Language Models](https://huggingface.co/papers/2504.14194).
+Code: https://github.com/opendatalab/Meta-rater
 ## Model Description
 This is a 7.2B parameter transformer-based decoder-only language model trained from scratch on 150B tokens selected from SlimPajama dataset using the **Meta-rater** framework with all 25 quality scores. This represents the largest and most capable model in the Meta-rater research, demonstrating maximal benefits of quality-driven data selection at scale.
 ## Contact
+For questions or issues, please contact the authors or open an issue in the repository.