Qwen2.5-coder-1.5b Evaluation on EvalPlus achieve way lower pass@1 rate than the reported one in the paper

#9
by peterzsj6 - opened

Evaluation method used:

https://github.com/QwenLM/Qwen3-Coder/blob/main/qwencoder-eval/base/run_evaluate_cq2.5.sh
Skipping benchmarks other than EvalPlus

Model used:

https://huggingface.co/Qwen/Qwen2.5-Coder-1.5B

Output of run_evaluate_cq2.5.sh

Image

Sign up or log in to comment