Fine-tuning roadmap

#18
by RonanMcGovern - opened

What fine-tuning library is likely to first be able to support deepseek v3?

Transformers did not have v2 integrated.

MoE might take work, latent attention, and MTP. Then also supporting fp8 as a base model on which to train Loras…

Thanks, and thanks for the model.

I also have the same question, how to fine-tune DeepSeek-V3 ? Could a guide be provided?

+1 for finetuning script

This comment has been hidden

Hello, I am running deepseek V3 0324 on Nanogpt and Wanted to know if there is a fine tuning used in hugging face i can use for deepseek myself, since in the demo it acts basically the same model of deepseek originally while on Nanogpt it's a bit different, despite having no censorship or anything, Just raw model. Thank you.

Sign up or log in to comment