Update README.md
Browse files
README.md
CHANGED
|
@@ -140,7 +140,8 @@ Tülu3 is designed for state-of-the-art performance on a diversity of tasks in a
|
|
| 140 |
|-----------|-------------------|
|
| 141 |
| **Base Model** | [meta-llama/llama-3.1-405B](https://huggingface.co/meta-llama/llama-3.1-405B) |
|
| 142 |
| **SFT** | [allenai/llama-3.1-Tulu-3-405B-SFT](https://huggingface.co/allenai/llama-3.1-Tulu-3-405B-SFT) |
|
| 143 |
-
| **
|
|
|
|
| 144 |
| **Reward Model (RM)**| (Same as 8B)
|
| 145 |
|
| 146 |
|
|
|
|
| 140 |
|-----------|-------------------|
|
| 141 |
| **Base Model** | [meta-llama/llama-3.1-405B](https://huggingface.co/meta-llama/llama-3.1-405B) |
|
| 142 |
| **SFT** | [allenai/llama-3.1-Tulu-3-405B-SFT](https://huggingface.co/allenai/llama-3.1-Tulu-3-405B-SFT) |
|
| 143 |
+
| **DPO** | [allenai/llama-3.1-Tulu-3-405B-DPO](https://huggingface.co/allenai/llama-3.1-Tulu-3-405B-DPO) |
|
| 144 |
+
| **Final Model (RLVR)** | [allenai/llama-3.1-Tulu-3-405B](https://huggingface.co/allenai/llama-3.1-Tulu-3-405B) |
|
| 145 |
| **Reward Model (RM)**| (Same as 8B)
|
| 146 |
|
| 147 |
|