Nice gpt-oss distill
At least have the decency to distill from good models, what you made is pure trash, waste of compute.
At least have the decency to distill from good models, what you made is pure trash, waste of compute.
A distill that's 110B parameters larger?
At least have the decency to distill from good models, what you made is pure trash, waste of compute.
A distill that's 110B parameters larger?
Yes, they are this stupid. They distilled a 120B into 229B model.
Thanks for the comment, but just to correct the misinformation:
If MiniMax M2 were truly “pure trash,” you’d see it reflected in the benchmarks, and you don’t.
We welcome tough feedback, but it needs to be factual if it’s going to be useful. If you have specific technical points, we’re always happy to dive deep.
We open-sourced M2 so that everyone can use it freely and evaluate it transparently.
And honestly, if M2 doesn’t work for your needs, you’re absolutely free to use any other model. 😊