on-policy-distillation

Running

About lr and evaluation

by 141forever - opened 20 days ago

20 days ago

Hello, I have reproduced the code results strictly according to your traing parameter settings. I found that when the learning rate is set to 1e-7, it is difficult to learn anything, so the learning rate needs to be set between 1e-5 and 5e-5 for successful training.

BTW, may I ask if it's possible to share the parameter settings and code for the evaluation?

cmpatino

Hugging Face H4 org 16 days ago

Thanks for letting us know about the learning rate!

About the eval code, I just created a gist with the script we were using internally. Hope it's useful

https://gist.github.com/cmpatino/2270db038f93e8714f8fb213ff60f48f

cmpatino changed discussion status to closed 16 days ago

141forever

12 days ago

Thank you very much for providing the code.
At present, my student model is LLaMA3.2-1B, and the teacher model is Qwen3-4B. The training and testing data I am using is the CountDown Qwen3-4B version (27.7K).
I am training on two GPUs, and the training hyperparameters are as follows:
'''
training_args = GOLDConfig(
save_strategy="steps",
save_steps=500,
learning_rate=5e-5,
warmup_ratio=0.05,
per_device_train_batch_size=16,
max_completion_length = 512,
teacher_model_name_or_path=teacher_name,
teacher_tokenizer_name_or_path=teacher_name,
bf16=True,
use_uld_loss=True,
uld_use_hybrid_loss=True,
push_to_hub=False,
report_to=[],
lr_scheduler_type = 'cosine',
num_train_epochs=5,
max_steps=3000,
logging_steps=10,
gradient_accumulation_steps=1,
lmbda = 1.0,
beta = 0.0,
uld_crossentropy_weight = 0.0,
uld_distillation_weight = 1.0,
)
'''
Currently, the loss can decrease from 1.8 to around 0.1. However, the trained model is unable to generate outputs in the required and format. Moreover, its responses are almost irrelevant to the questions and do not form complete sentences.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment