About lr and evaluation

#6
by 141forever - opened

Hello, I have reproduced the code results strictly according to your traing parameter settings. I found that when the learning rate is set to 1e-7, it is difficult to learn anything, so the learning rate needs to be set between 1e-5 and 5e-5 for successful training.

BTW, may I ask if it's possible to share the parameter settings and code for the evaluation?

Hugging Face H4 org

Thanks for letting us know about the learning rate!

About the eval code, I just created a gist with the script we were using internally. Hope it's useful

https://gist.github.com/cmpatino/2270db038f93e8714f8fb213ff60f48f

cmpatino changed discussion status to closed

Thank you very much for providing the code.
At present, my student model is LLaMA3.2-1B, and the teacher model is Qwen3-4B. The training and testing data I am using is the CountDown Qwen3-4B version (27.7K).
I am training on two GPUs, and the training hyperparameters are as follows:
'''
training_args = GOLDConfig(
save_strategy="steps",
save_steps=500,
learning_rate=5e-5,
warmup_ratio=0.05,
per_device_train_batch_size=16,
max_completion_length = 512,
teacher_model_name_or_path=teacher_name,
teacher_tokenizer_name_or_path=teacher_name,
bf16=True,
use_uld_loss=True,
uld_use_hybrid_loss=True,
push_to_hub=False,
report_to=[],
lr_scheduler_type = 'cosine',
num_train_epochs=5,
max_steps=3000,
logging_steps=10,
gradient_accumulation_steps=1,
lmbda = 1.0,
beta = 0.0,
uld_crossentropy_weight = 0.0,
uld_distillation_weight = 1.0,
)
'''
Currently, the loss can decrease from 1.8 to around 0.1. However, the trained model is unable to generate outputs in the required and format. Moreover, its responses are almost irrelevant to the questions and do not form complete sentences.

Sign up or log in to comment