SmolLM2-360M - Kto
Model Description
This model is a LoRA Adapter fine-tuned from HuggingFaceTB/SmolLM2-360M using the Kto training method.
Kahneman-Tversky Optimization - Binary preference optimization based on Prospect Theory
This model was developed as part of thesis research on LLM Alignment using Preference Optimization Methods.
Model Details
| Property | Value |
|---|---|
| Base Model | HuggingFaceTB/SmolLM2-360M |
| Training Method | Kto |
| Model Type | LoRA Adapter |
| Training Date | December 2025 |
| Framework | PyTorch + Transformers + PEFT |
Benchmark Results
| Benchmark | Score |
|---|---|
| HellaSwag (10-shot) | 0.394 |
| TruthfulQA (0-shot MC2) | 0.474 |
| MMLU-Mini (5-shot) | 0.254 |
Comparative Analysis
The following chart compares this method against other training approaches on the same base model:
Training Configuration
| Parameter | Value |
|---|---|
| Epochs | 1 |
| Batch Size | 2 |
| Gradient Accumulation | 8 |
| Effective Batch Size | 16 |
| Learning Rate | 2e-4 |
| Max Sequence Length | 512 |
| LoRA Rank | 16 |
| LoRA Alpha | 32 |
| Dataset | UltraFeedback Binarized |
Usage
Loading as LoRA Adapter
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
# Load base model
base_model = AutoModelForCausalLM.from_pretrained("HuggingFaceTB/SmolLM2-360M")
tokenizer = AutoTokenizer.from_pretrained("HuggingFaceTB/SmolLM2-360M")
# Load adapter
model = PeftModel.from_pretrained(base_model, "Nishef/SmolLM2-360M-Full_KTO_20251225_020028")
# Generate text
inputs = tokenizer("Hello, how are you?", return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0]))
Training Methodology
Kto
Kahneman-Tversky Optimization - Binary preference optimization based on Prospect Theory
Key Features:
- Binary feedback signals (thumbs up/down)
- No need for paired preference data
- Reference model for KL divergence regularization
- Prospect Theory-inspired loss function
Citation
If you use this model in your research, please cite:
@misc{smollm2_360m_kto_2025,
title = {SmolLM2-360M Fine-tuned with Kto},
author = {Thesis Research},
year = {2025},
publisher = {HuggingFace},
url = {https://huggingface.co/Nishef/SmolLM2-360M-Full_KTO_20251225_020028}
}
Repository Structure
.
βββ adapter_config.json # LoRA configuration
βββ adapter_model.safetensors # Model weights
βββ tokenizer files # Tokenizer configuration
βββ eval_summary.csv # Evaluation results
βββ thesis_plots/ # Visualization assets
β βββ benchmark_results.png
β βββ training_loss.png
βββ README.md # This file
Acknowledgments
- Base Model: HuggingFaceTB/SmolLM2-360M
- Training Framework: Hugging Face Transformers
- Fine-tuning Library: PEFT
License
This model is released under the Apache 2.0 license.
This model was created as part of thesis research on LLM alignment using preference optimization methods.
Model tree for Nishef/SmolLM2-360M-Full_KTO_20251225_020028
Base model
HuggingFaceTB/SmolLM2-360M