Model Card for Model ID

Since the text-only performance of Qwen/Qwen3-VL-8B-Instruct is better than Qwen/Qwen3-8B no-thinking, I copy the weight of vl_model.model.language_model.

The Qwen3ForCausalLM config is from the text_config of original Qwen/Qwen3-VL-8B-Instruct and the tokenizer is from Qwen/Qwen3-8B.

FYI: For AIME25 performance I got 47.71 with this model and original Qwen3-VL-8B-Instruct shows 47.917.

(setting: max_length=32k (including prompt), temperature=1.0, top_p=1.0, top_k=40, repetition_penalty=1.0, presence_penalty=2.0)

Downloads last month: 33

Safetensors

Model size

8B params

Tensor type

BF16