qwen2.5-7b-instruct-opd-v1
This repository provides a merged full model produced by supervised fine-tuning for task-oriented instruction following.
Training Objective
Improve instruction following, action consistency, and response reliability in practical workflows.
Training Configuration
- Method: SFT (TRL SFTTrainer + Transformers, full-model)
- Base model ID: unsloth/Qwen2.5-7B-Instruct
- Validation ratio in checkpoint dataset build: 0.05
- Max sequence length: 512
- Max steps (actual): 4
- Epochs: 1
- Learning rate: 1e-6
- Per-device train batch size: 2
- Per-device eval batch size: 2
- Gradient accumulation steps: 8
- Effective global batch size: 16
- Warmup ratio: 0.1
- Weight decay: 0.05
- LR scheduler: cosine
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "uchkw/qwen2.5-7b-instruct-opd-v1"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto")
Training Data / Sources & License (IMPORTANT)
- DBBench source dataset licenses:
u-10bei/dbbench_sft_dataset_react:MITu-10bei/dbbench_sft_dataset_react_v2:MITu-10bei/dbbench_sft_dataset_react_v3:MITu-10bei/dbbench_sft_dataset_react_v4:MIT
- DBBench data is further mixed with rule-based synthetic gap-fill samples, generated with:
- teacher model:
Qwen/Qwen2.5-72B-Instruct-AWQ - comparison student model:
unsloth/Qwen2.5-7B-Instruct
- teacher model:
- ALFWorld portion is rule-based synthetic worldtask data for instruction-following/action-format rehearsal.
- Compliance: Users must comply with the base model's terms of use.
- Downloads last month
- 15