π GPT-OSS β Finetuned on Custom Instruction Data
This repository contains GPT-OSS, a finetuned version of an open-source LLM trained using custom, high-quality instruction/agent data. The objective of this finetuning process is to enhance:
Instruction following
Reasoning
Step-by-step task execution
Conversational quality
Task planning & agent-style responses
This finetuned model retains the capabilities of the base LLM while becoming more aligned, structured, and usable for real-world assistant-style tasks.
π Model Details Property Details Base Model GPT-OSS (20B) Finetuning Type Supervised Finetuning (SFT) Format Used Merged LoRA β FP16 / MXFP4 depending on release Tokenizer Same as base GPT-OSS tokenizer Architecture Decoder-only Transformer Context Length As per base model (typically 4kβ8k tokens)
π οΈ Training Training Objective The goal of finetuning was to improve:
natural language understanding
structured reasoning
agent-style task completion
multi-turn conversation coherence
instruction following
helpfulness & clarity
π Dataset
The model was trained on a custom instruction dataset containing:
Conversational instructions Agent-like ReAct-style prompts Structured reasoning demonstrations Multi-turn dialog tasks Task-planning instructions Domain-specific prompts for practical usage All data was manually curated and aligned for safety and clarity. The dataset is private but can be replaced with your own custom instruction set.
π§ͺ How the Model Was Trained
Finetuning was performed in Google Colab using:
HuggingFace Transformers PEFT / LoRA Bitsandbytes
Custom training notebook
The Colab notebook used for finetuning is included in a separate dataset repo:
π Training Notebook: gpt_oss_(20B)_Fine_tuning.ipynb
(You may replace with your actual HF link.)
π¦ Model Variants Available
Variant Name Description Best Use merged_16bit FP16 merged model vLLM, SGLang, servers mxfp4 4-bit quantized Local GPU (4β8GB), CPU inference
π‘ Intended Use This model is useful for:
AI assistants Chatbots Reasoning tasks Educational tools Task-execution agents General conversation Planning / workflow guidance
β οΈ Limitations
Like any LLM, GPT-OSS has limitations:
May hallucinate facts Not suitable for legal/medical/financial advice Not trained for harmful or unsafe domains May produce incorrect reasoning occasionally
π Safety
The dataset was manually curated to avoid:
Toxic content Hate speech Violence Personal data Malware instructions Nevertheless, always evaluate outputs before deploying to production.
βΆοΈ Example Usage from transformers import AutoModelForCausalLM, AutoTokenizer import torch
model_id = "Neo404/gpt_oss_finetuned"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16)
prompt = "Explain how reinforcement learning works in simple terms."
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=200)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
π Deployment Notes
For vLLM: Use the merged_16bit model For local GPU (GTX 1650): Use the mxfp4 4-bit quantized version
π License
This model follows the license of the original GPT-OSS base model. Users must comply with the original terms when using derivative models.
- Downloads last month
- 1
Model tree for Neo404/oss_finetuned
Base model
openai/gpt-oss-20b