nvidia
/

nemo-megatron-gpt-1.3B

text2text-generation

Model card Files Files and versions

okuchaiev commited on Sep 14, 2022

Commit

d1b529c

·

1 Parent(s): 96ac9a1

Update README.md

Files changed (1) hide show

README.md +6 -6

README.md CHANGED Viewed

@@ -24,7 +24,7 @@ img {
 ## Model Description
-Megatron-GPT 1.3B is a transformer-based language model. GPT refers to a class of transformer decoder-only models similar to GPT-2 and 3 while 1.3B refers to the total trainable parameter count (1.3 Billion) [1, 2].
 This model was trained with [NeMo Megatron](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/nlp/nemo_megatron/intro.html).
@@ -95,17 +95,15 @@ print(sentences)
 ## Training Data
-The model was trained on ["The Piles" dataset prepared by Eleuther.AI](https://pile.eleuther.ai/).
 ## Evaluation results
-*Zero-shot performance.*
 | ARC-Challenge	| ARC-Easy | RACE-middle | RACE-high | Winogrande | RTE | BoolQA | HellaSwag | PiQA |
 | ------------- | -------- | ----------- | --------- | ---------- | --- | ------ | --------- | ---- |
-| 0.3012        | 0.4596  | 0.459       | 0.3811    | 0.5343     | 0.5451 | 0.5979 | 0.4442 | 0.6834 |
 ## References
@@ -115,6 +113,8 @@ The model was trained on ["The Piles" dataset prepared by Eleuther.AI](https://p
 [3] [NVIDIA NeMo Toolkit](https://github.com/NVIDIA/NeMo)
 ## Licence
 License to use this model is covered by the [CC-BY-4.0](https://creativecommons.org/licenses/by/4.0/). By downloading the public and release version of the model, you accept the terms and conditions of the [CC-BY-4.0](https://creativecommons.org/licenses/by/4.0/) license.

 ## Model Description
+Megatron-GPT 1.3B is a transformer-based language model. GPT refers to a class of transformer decoder-only models similar to GPT-2 and 3 while 1.3B refers to the total trainable parameter count (1.3 Billion) [1, 2]. It has Tensor Parallelism (TP) of 1, Pipeline Parallelism (PP) of 1 and should fit on a single NVIDIA GPU.
 This model was trained with [NeMo Megatron](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/nlp/nemo_megatron/intro.html).
 ## Training Data
+The model was trained on ["The Piles" dataset prepared by Eleuther.AI](https://pile.eleuther.ai/). [4]
 ## Evaluation results
+*Zero-shot performance.* Evaluated using [LM Evaluation Test Suite from AI21](https://github.com/AI21Labs/lm-evaluation)
 | ARC-Challenge	| ARC-Easy | RACE-middle | RACE-high | Winogrande | RTE | BoolQA | HellaSwag | PiQA |
 | ------------- | -------- | ----------- | --------- | ---------- | --- | ------ | --------- | ---- |
+| 0.3012        | 0.4596  | 0.459       | 0.3797    | 0.5343     | 0.5451 | 0.5979 | 0.4443 | 0.6834 |
 ## References
 [3] [NVIDIA NeMo Toolkit](https://github.com/NVIDIA/NeMo)
+[4] [The Pile: An 800GB Dataset of Diverse Text for Language Modeling](https://arxiv.org/abs/2101.00027)
 ## Licence
 License to use this model is covered by the [CC-BY-4.0](https://creativecommons.org/licenses/by/4.0/). By downloading the public and release version of the model, you accept the terms and conditions of the [CC-BY-4.0](https://creativecommons.org/licenses/by/4.0/) license.