LLaMA-3.1-8B SFT (Prompt Masking)
Fine-tuned LLaMA-3.1-8B using SFT instruction tuning with prompt masking (loss computed only on response tokens).
Training Details
- Base Model: meta-llama/Llama-3.1-8B
- Dataset: UltraChat-200K + SafetyLlama (~200K examples)
- Training: 1 epoch (6326 steps)
- Prompt Masking: Enabled (loss on response tokens only)
Evaluation Results
| Benchmark | Baseline | This Model |
|---|---|---|
| GSM8K | 16.4% | 32.7% |
| MMLU | 58.1% | 58.2% |
| SST Safety | 62.0% | 77.0% |
| AlpacaEval | 1.57% | 4.5% |
Files
eval_baseline/: Baseline evaluation results (pre-finetuning Llama-3.1-8B)
Reference
Part of CS336 Assignment 5 (SFT Instruction Tuning). See building-from-scratch/sft for details.
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for garg-aayush/llama31-8b-sft-mask
Base model
meta-llama/Llama-3.1-8BDataset used to train garg-aayush/llama31-8b-sft-mask
Evaluation results
- Accuracy on GSM8Kself-reported32.700
- Accuracy on MMLUself-reported58.200
- Safety Score on Simple Safety Testsself-reported77.000
- LC Win Rate on AlpacaEvalself-reported4.500