Model Card for Model ID

Since the text-only performance of Qwen/Qwen3-VL-8B-Instruct is better than Qwen/Qwen3-8B no-thinking, I copy the weight of vl_model.model.language_model.

The Qwen3ForCausalLM config is from the text_config of original Qwen/Qwen3-VL-8B-Instruct and the tokenizer is from Qwen/Qwen3-8B.

FYI: For AIME25 performance I got 47.71 with this model and original Qwen3-VL-8B-Instruct shows 47.917.

(setting: max_length=32k (including prompt), temperature=1.0, top_p=1.0, top_k=40, repetition_penalty=1.0, presence_penalty=2.0)

Downloads last month
33
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support