Model Card for Model ID
Since the text-only performance of Qwen/Qwen3-VL-8B-Instruct is better than Qwen/Qwen3-8B no-thinking, I copy the weight of vl_model.model.language_model.
The Qwen3ForCausalLM config is from the text_config of original Qwen/Qwen3-VL-8B-Instruct and the tokenizer is from Qwen/Qwen3-8B.
FYI: For AIME25 performance I got 47.71 with this model and original Qwen3-VL-8B-Instruct shows 47.917.
(setting: max_length=32k (including prompt), temperature=1.0, top_p=1.0, top_k=40, repetition_penalty=1.0, presence_penalty=2.0)
- Downloads last month
- 33