Pinkstack
/

DistilGPT-OSS-qwen3-4B

Text Generation

text-generation-inference

Model card Files Files and versions

DistilGPT-OSS-qwen3-4B / README.md

Pinkstack's picture

Update README.md

dbe8be3 verified 4 months ago

|

3.38 kB

	---
	license: apache-2.0
	base_model:
	- Qwen/Qwen3-4B-Thinking-2507
	pipeline_tag: text-generation
	tags:
	- cot
	- code
	- gpt_oss
	- conversational
	- distillation
	- math
	---
	This is the bf16 safetensors variant
	![Distil gpt oss logo](https://cdn-uploads.huggingface.co/production/uploads/6710ba6af1279fe0dfe33afe/RxoOFH7vZmoyyKOUlB_oX.png)

	# What it is
	DistilGPT-OSS-qwen3-4B is a Qwen3 4B-2507 thinking fine tune, it supports up to 256K tokens of input and output (aka total context) and can think for up to 65536 tokens when set to high reasoning effort. unlike the original qwen3, this model was fine-tuned on GPT-OSS reasoning outputs (unlike Deepseek r1 outputs which qwen3 was probably fine-tuned on for advanced reasoning). By fine-tuning on GPT-OSS outputs, the model was able to learn how to think efficiently, follow instructions better, and the new ability to think with a certain effort based on how much you want it to think.

	⚠️This model is NOT as censored as the original GPT-OSS, we focused on performance rather than censorship. The model is still safety trained, it would just allow for more "creative" prompts, unlike GPT-OSS. We are not responsible for what the model generates.

	Keep in mind, this is a community project and we are NOT related to qwen by Alibaba nor GPT-OSS by OpenAi.

	# Format
	This is the chat format of this model (you can also check the Jinja template file in "Files and versions"):
	```
	<\|im_start\|>system
	You are a helpful assistant


	Reasoning effort: low<\|im_end\|>
	<\|im_start\|>user
	Continue the fibonnaci sequence: 1, 1, 2, 3, 5, 8,<\|im_end\|>
	<\|im_start\|>assistant
	<think>
	Just continue: 13,21,34,...
	</think>
	The Fibonacci sequence continues by adding the two preceding numbers.
	After 8 the next term is 13 (1 + 8 = 9 – 1 = 8 + 5 = 13), then 21 (8 + 13 = 21), followed by 34 (13 + 21 = 34), and so on.

	So the sequence starts:

	1, 1, 2, 3, 5, 8, 13, 21, 34, …<\|im_end\|>
	```

	As you can see, you set the reasoning effort via the system prompt. We recommend going 2 lines down and only then putting "Reasoning effort: [low,medium,high]. For your information that output was generated by our model.

	# Examples

	1) "Is a banana an animal?" Reasoning was set to high.
	![Is a banana an animal?](https://cdn-uploads.huggingface.co/production/uploads/6710ba6af1279fe0dfe33afe/f1N8knMusup4dugZ2WREB.png)
	2) "Make an HTML website about yourself" Reasoning was set to medium.
	![Write an HTML website about yourself](https://cdn-uploads.huggingface.co/production/uploads/6710ba6af1279fe0dfe33afe/azInLvZ1KGpT5DXT2zCyV.png)
	3) "translate this to chinese: Hello! I am ChatGPT. A large language model by OpenAi." Reasoning was set to low.
	![translate this to chinese: Hello! I am ChatGPT. A large language model by OpenAi.](https://cdn-uploads.huggingface.co/production/uploads/6710ba6af1279fe0dfe33afe/YH4Q0UY3aqeHRNhOgWv_V.png)

	As you can see, based on the reasoning effort of the model and your prompt, the model would think for a different amount of time.
	Keep in mind, these tests were done in LM Studio, GGUF q8_0 on a single consumer card (rtx 3080) where we got 95 - 80 Tokens/Second on 8192 context.

	# Additional information

	The model was trained using unsloth, using a mix of private datasets and public datasets.