DistilGPT-OSS-qwen3-4B / README.md

Pinkstack

Update README.md

dbe8be3 verified 3 months ago

preview code

raw

history blame

3.38 kB

metadata

license: apache-2.0
base_model:
  - Qwen/Qwen3-4B-Thinking-2507
pipeline_tag: text-generation
tags:
  - cot
  - code
  - gpt_oss
  - conversational
  - distillation
  - math

This is the bf16 safetensors variant

What it is

DistilGPT-OSS-qwen3-4B is a Qwen3 4B-2507 thinking fine tune, it supports up to 256K tokens of input and output (aka total context) and can think for up to 65536 tokens when set to high reasoning effort. unlike the original qwen3, this model was fine-tuned on GPT-OSS reasoning outputs (unlike Deepseek r1 outputs which qwen3 was probably fine-tuned on for advanced reasoning). By fine-tuning on GPT-OSS outputs, the model was able to learn how to think efficiently, follow instructions better, and the new ability to think with a certain effort based on how much you want it to think.

⚠️This model is NOT as censored as the original GPT-OSS, we focused on performance rather than censorship. The model is still safety trained, it would just allow for more "creative" prompts, unlike GPT-OSS. We are not responsible for what the model generates.

Keep in mind, this is a community project and we are NOT related to qwen by Alibaba nor GPT-OSS by OpenAi.

Format

This is the chat format of this model (you can also check the Jinja template file in "Files and versions"):

<|im_start|>system
You are a helpful assistant


Reasoning effort: low<|im_end|>
<|im_start|>user
Continue the fibonnaci sequence: 1, 1, 2, 3, 5, 8,<|im_end|>
<|im_start|>assistant
<think>
Just continue: 13,21,34,...
</think>
The Fibonacci sequence continues by adding the two preceding numbers.  
After **8** the next term is **13** (1 + 8 = 9 – 1 = 8 + 5 = 13), then **21** (8 + 13 = 21), followed by **34** (13 + 21 = 34), and so on.

So the sequence starts:  

**1, 1, 2, 3, 5, 8, 13, 21, 34, …**<|im_end|>

As you can see, you set the reasoning effort via the system prompt. We recommend going 2 lines down and only then putting "Reasoning effort: [low,medium,high]. For your information that output was generated by our model.

Examples

"Is a banana an animal?" Reasoning was set to high.
"Make an HTML website about yourself" Reasoning was set to medium.
"translate this to chinese: Hello! I am ChatGPT. A large language model by OpenAi." Reasoning was set to low.

As you can see, based on the reasoning effort of the model and your prompt, the model would think for a different amount of time. Keep in mind, these tests were done in LM Studio, GGUF q8_0 on a single consumer card (rtx 3080) where we got 95 - 80 Tokens/Second on 8192 context.

Additional information

The model was trained using unsloth, using a mix of private datasets and public datasets.