Update README.md
Browse files
README.md
CHANGED
|
@@ -21,7 +21,7 @@ tags:
|
|
| 21 |
We’re thrilled to introduce AceMath-RL-Nemotron-7B, a math reasoning model trained entirely through reinforcement learning (RL), starting from the Deepseek-R1-Distilled-Qwen-7B. It delivers impressive results, achieving 69.0% Pass@1 accuracy on AIME 2024 (+13.5% gain) and 53.6% Pass@1 accuracy on AIME 2025 (+14.4% gain).
|
| 22 |
Interestingly, this math-focused RL training also improves the model’s coding accuracy on LiveCodeBench, reaching 44.4% Pass@1 (+6.8% gain), demonstrating the generalization capabilities of scaled RL training.
|
| 23 |
|
| 24 |
-
We share our training recipe, training logs, and data curation details in our [BLOG](
|
| 25 |
|
| 26 |
|
| 27 |
## Results
|
|
@@ -76,8 +76,15 @@ generated_ids = [
|
|
| 76 |
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
|
| 77 |
```
|
| 78 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 79 |
## Correspondence to
|
| 80 |
-
Yang Chen
|
| 81 |
|
| 82 |
|
| 83 |
## License
|
|
|
|
| 21 |
We’re thrilled to introduce AceMath-RL-Nemotron-7B, a math reasoning model trained entirely through reinforcement learning (RL), starting from the Deepseek-R1-Distilled-Qwen-7B. It delivers impressive results, achieving 69.0% Pass@1 accuracy on AIME 2024 (+13.5% gain) and 53.6% Pass@1 accuracy on AIME 2025 (+14.4% gain).
|
| 22 |
Interestingly, this math-focused RL training also improves the model’s coding accuracy on LiveCodeBench, reaching 44.4% Pass@1 (+6.8% gain), demonstrating the generalization capabilities of scaled RL training.
|
| 23 |
|
| 24 |
+
We share our training recipe, training logs, and data curation details in our [BLOG](https://research.nvidia.com/labs/adlr/acemath_rl/).
|
| 25 |
|
| 26 |
|
| 27 |
## Results
|
|
|
|
| 76 |
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
|
| 77 |
```
|
| 78 |
|
| 79 |
+
|
| 80 |
+
## Usage Recommendations
|
| 81 |
+
|
| 82 |
+
1. Don't include a system prompt; instead, place all instructions directly in the user prompt.
|
| 83 |
+
2. We recommend using the following prompt format for math questions:<br>*<|begin▁of▁sentence|><|User|>{math_question}\nPlease reason step by step, and put your final answer within \boxed{}.<|Assistant|><think>\n*
|
| 84 |
+
|
| 85 |
+
|
| 86 |
## Correspondence to
|
| 87 |
+
Yang Chen ([email protected]),<br>Zihan Liu ([email protected]),<br>Chankyu Lee ([email protected]),<br>Wei Ping ([email protected])
|
| 88 |
|
| 89 |
|
| 90 |
## License
|