Update README.md
Browse files
README.md
CHANGED
|
@@ -24,7 +24,7 @@ img {
|
|
| 24 |
|
| 25 |
## Model Description
|
| 26 |
|
| 27 |
-
Megatron-GPT 1.3B is a transformer-based language model. GPT refers to a class of transformer decoder-only models similar to GPT-2 and 3 while 1.3B refers to the total trainable parameter count (1.3 Billion) [1, 2].
|
| 28 |
|
| 29 |
This model was trained with [NeMo Megatron](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/nlp/nemo_megatron/intro.html).
|
| 30 |
|
|
@@ -95,17 +95,15 @@ print(sentences)
|
|
| 95 |
|
| 96 |
## Training Data
|
| 97 |
|
| 98 |
-
The model was trained on ["The Piles" dataset prepared by Eleuther.AI](https://pile.eleuther.ai/).
|
| 99 |
|
| 100 |
## Evaluation results
|
| 101 |
|
| 102 |
-
*Zero-shot performance.*
|
| 103 |
|
| 104 |
| ARC-Challenge | ARC-Easy | RACE-middle | RACE-high | Winogrande | RTE | BoolQA | HellaSwag | PiQA |
|
| 105 |
| ------------- | -------- | ----------- | --------- | ---------- | --- | ------ | --------- | ---- |
|
| 106 |
-
| 0.3012 | 0.4596 | 0.459 | 0.
|
| 107 |
-
|
| 108 |
-
|
| 109 |
|
| 110 |
## References
|
| 111 |
|
|
@@ -115,6 +113,8 @@ The model was trained on ["The Piles" dataset prepared by Eleuther.AI](https://p
|
|
| 115 |
|
| 116 |
[3] [NVIDIA NeMo Toolkit](https://github.com/NVIDIA/NeMo)
|
| 117 |
|
|
|
|
|
|
|
| 118 |
## Licence
|
| 119 |
|
| 120 |
License to use this model is covered by the [CC-BY-4.0](https://creativecommons.org/licenses/by/4.0/). By downloading the public and release version of the model, you accept the terms and conditions of the [CC-BY-4.0](https://creativecommons.org/licenses/by/4.0/) license.
|
|
|
|
| 24 |
|
| 25 |
## Model Description
|
| 26 |
|
| 27 |
+
Megatron-GPT 1.3B is a transformer-based language model. GPT refers to a class of transformer decoder-only models similar to GPT-2 and 3 while 1.3B refers to the total trainable parameter count (1.3 Billion) [1, 2]. It has Tensor Parallelism (TP) of 1, Pipeline Parallelism (PP) of 1 and should fit on a single NVIDIA GPU.
|
| 28 |
|
| 29 |
This model was trained with [NeMo Megatron](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/nlp/nemo_megatron/intro.html).
|
| 30 |
|
|
|
|
| 95 |
|
| 96 |
## Training Data
|
| 97 |
|
| 98 |
+
The model was trained on ["The Piles" dataset prepared by Eleuther.AI](https://pile.eleuther.ai/). [4]
|
| 99 |
|
| 100 |
## Evaluation results
|
| 101 |
|
| 102 |
+
*Zero-shot performance.* Evaluated using [LM Evaluation Test Suite from AI21](https://github.com/AI21Labs/lm-evaluation)
|
| 103 |
|
| 104 |
| ARC-Challenge | ARC-Easy | RACE-middle | RACE-high | Winogrande | RTE | BoolQA | HellaSwag | PiQA |
|
| 105 |
| ------------- | -------- | ----------- | --------- | ---------- | --- | ------ | --------- | ---- |
|
| 106 |
+
| 0.3012 | 0.4596 | 0.459 | 0.3797 | 0.5343 | 0.5451 | 0.5979 | 0.4443 | 0.6834 |
|
|
|
|
|
|
|
| 107 |
|
| 108 |
## References
|
| 109 |
|
|
|
|
| 113 |
|
| 114 |
[3] [NVIDIA NeMo Toolkit](https://github.com/NVIDIA/NeMo)
|
| 115 |
|
| 116 |
+
[4] [The Pile: An 800GB Dataset of Diverse Text for Language Modeling](https://arxiv.org/abs/2101.00027)
|
| 117 |
+
|
| 118 |
## Licence
|
| 119 |
|
| 120 |
License to use this model is covered by the [CC-BY-4.0](https://creativecommons.org/licenses/by/4.0/). By downloading the public and release version of the model, you accept the terms and conditions of the [CC-BY-4.0](https://creativecommons.org/licenses/by/4.0/) license.
|