|
|
--- |
|
|
license: mit |
|
|
language: |
|
|
- en |
|
|
pipeline_tag: text-generation |
|
|
datasets: |
|
|
- FreedomIntelligence/medical-o1-reasoning-SFT |
|
|
base_model: |
|
|
- unsloth/DeepSeek-R1-Distill-Llama-8B |
|
|
--- |
|
|
# DeepSeek-R1-Distill-Llama-8B - Fine-Tuned for Medical Chain-of-Thought Reasoning |
|
|
|
|
|
## Model Overview |
|
|
The **DeepSeek-R1-Distill-Llama-8B** model has been fine-tuned for medical chain-of-thought (CoT) reasoning. This fine-tuning process enhances the model's ability to generate structured, concise, and accurate medical reasoning outputs. The model was trained using a 500-sample subset of the **medical-o1-reasoning-SFT** dataset, with optimizations including **4-bit quantization** and **LoRA adapters** to improve efficiency and reduce memory usage. |
|
|
|
|
|
### Key Features |
|
|
- **Base Model:** [unsloth/DeepSeek-R1-Distill-Llama-8B](https://huggingface.co/unsloth/DeepSeek-R1-Distill-Llama-8B) |
|
|
- **Fine-Tuning Objective:** Adaptation for structured, step-by-step medical reasoning tasks. |
|
|
- **Training Dataset:** 500 samples from **medical-o1-reasoning-SFT** dataset. |
|
|
- **Tools Used:** |
|
|
- **Unsloth:** Accelerates training by 2x. |
|
|
- **4-bit Quantization:** Reduces model memory usage. |
|
|
- **LoRA Adapters:** Enables parameter-efficient fine-tuning. |
|
|
- **Training Time:** 44 minutes. |
|
|
|
|
|
### Performance Improvements |
|
|
- **Response Length:** Reduced from an average of 450 words to 150 words, improving conciseness. |
|
|
- **Reasoning Style:** Shifted from verbose explanations to more focused, structured reasoning. |
|
|
- **Answer Format:** Transitioned from bulleted lists to paragraph-style answers for clarity. |
|
|
|
|
|
## Intended Use |
|
|
This model is designed for use by: |
|
|
- **Medical professionals** requiring structured diagnostic reasoning. |
|
|
- **Researchers** seeking assistance in medical knowledge extraction. |
|
|
- **Developers** integrating the model for medical CoT tasks in clinical settings, treatment planning, and education. |
|
|
|
|
|
Typical use cases include: |
|
|
- Clinical diagnostics |
|
|
- Treatment planning |
|
|
- Medical education and training |
|
|
- Research assistance |
|
|
|
|
|
## Training Details |
|
|
|
|
|
### Key Components: |
|
|
- **Model:** [unsloth/DeepSeek-R1-Distill-Llama-8B](https://huggingface.co/unsloth/DeepSeek-R1-Distill-Llama-8B) |
|
|
- **Dataset:** **medical-o1-reasoning-SFT** (500 samples) |
|
|
- **Training Tools:** |
|
|
- **Unsloth:** Optimized training for faster results (2x speedup). |
|
|
- **4-bit Quantization:** Optimized memory usage for efficient training. |
|
|
- **LoRA Adapters:** Enables lightweight fine-tuning with reduced computational costs. |
|
|
|
|
|
### Fine-Tuning Process: |
|
|
1. **Install Required Packages:** |
|
|
Installed necessary libraries, including **unsloth** and **kaggle**. |
|
|
|
|
|
2. **Authentication:** |
|
|
Authenticated with **Hugging Face Hub** and **Weights & Biases** for tracking experiments and versioning. |
|
|
|
|
|
3. **Model Initialization:** |
|
|
Initialized the base model with **4-bit quantization** and a sequence length of up to 2048 tokens. |
|
|
|
|
|
4. **Pre-Fine-Tuning Inference:** |
|
|
Conducted an initial inference to establish the model’s baseline performance on a medical question. |
|
|
|
|
|
5. **Dataset Preparation:** |
|
|
Structured and formatted the training data using a custom template tailored to medical CoT reasoning tasks. |
|
|
|
|
|
6. **Application of LoRA Adapters:** |
|
|
Incorporated **LoRA adapters** for efficient parameter tuning during fine-tuning. |
|
|
|
|
|
7. **Supervised Fine-Tuning:** |
|
|
Utilized **SFTTrainer** to fine-tune the model with optimized hyperparameters for 44 minutes. |
|
|
|
|
|
8. **Post-Fine-Tuning Inference:** |
|
|
Evaluated the model’s improved performance by testing it on the same medical question after fine-tuning. |
|
|
|
|
|
9. **Saving and Loading:** |
|
|
Stored the fine-tuned model, including **LoRA adapters**, for easy future use and deployment. |
|
|
|
|
|
10. **Model Deployment:** |
|
|
Pushed the fine-tuned model to **Hugging Face Hub** in **GGUF format** with 4-bit quantization enabled for efficient use. |
|
|
|
|
|
## Notebook |
|
|
|
|
|
Access the implementation notebook for this model[here](https://github.com/SURESHBEEKHANI/Advanced-LLM-Fine-Tuning/blob/main/Deep-seek-R1-Medical-reasoning-SFT.ipynb). This notebook provides detailed steps for fine-tuning and deploying the model. |
|
|
|