Model Card for Qwen3-4B-GKD-Tulu

This model is a fine-tuned version of burtenshaw/Qwen3-4B-SFT-Codeforces. It has been trained using TRL.

Quick start

from transformers import pipeline

question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
generator = pipeline("text-generation", model="burtenshaw/Qwen3-4B-GKD-Tulu", device="cuda")
output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
print(output["generated_text"])

Training procedure

This model was trained with GKD, a method introduced in On-Policy Distillation of Language Models: Learning from Self-Generated Mistakes.

Framework versions

TRL: 0.24.0.dev0
Transformers: 4.57.1
Pytorch: 2.9.0
Datasets: 4.3.0
Tokenizers: 0.22.1

Citations

Cite GKD as:

@inproceedings{agarwal2024on-policy,
    title        = {{On-Policy Distillation of Language Models: Learning from Self-Generated Mistakes}},
    author       = {Rishabh Agarwal and Nino Vieillard and Yongchao Zhou and Piotr Stanczyk and Sabela Ramos Garea and Matthieu Geist and Olivier Bachem},
    year         = 2024,
    booktitle    = {The Twelfth International Conference on Learning Representations, {ICLR} 2024, Vienna, Austria, May 7-11, 2024},
    publisher    = {OpenReview.net},
    url          = {https://openreview.net/forum?id=3zKtaqxLhW},
}

Cite TRL as:

@misc{vonwerra2022trl,
    title        = {{TRL: Transformer Reinforcement Learning}},
    author       = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
    year         = 2020,
    journal      = {GitHub repository},
    publisher    = {GitHub},
    howpublished = {\url{https://github.com/huggingface/trl}}
}

Downloads last month: 43

Safetensors

Model size

4B params

Tensor type

BF16

Model tree for burtenshaw/Qwen3-4B-GKD-Tulu

Base model

Qwen/Qwen3-4B-Base

Finetuned

Qwen/Qwen3-4B

Finetuned

burtenshaw/Qwen3-4B-SFT-Codeforces

Finetuned

(1)

this model

Quantizations

1 model

Evaluation results

Artificial Analysis Intelligence Index on Artificial Analysis Benchmarks
Artificial Analysis API

28.300
Artificial Analysis Coding Index on Artificial Analysis Benchmarks
Artificial Analysis API

21.800
Artificial Analysis Math Index on Artificial Analysis Benchmarks
Artificial Analysis API

19.000
Mmlu Pro on Artificial Analysis Benchmarks
Artificial Analysis API

0.743
Gpqa on Artificial Analysis Benchmarks
Artificial Analysis API

0.589
Hle on Artificial Analysis Benchmarks
Artificial Analysis API

0.042
Livecodebench on Artificial Analysis Benchmarks
Artificial Analysis API

0.406
Scicode on Artificial Analysis Benchmarks
Artificial Analysis API

0.226
Math 500 on Artificial Analysis Benchmarks
Artificial Analysis API

0.904
Aime on Artificial Analysis Benchmarks
Artificial Analysis API

0.747

View on Papers With Code