Nice gpt-oss distill

#43

by ChuckMcSneed - opened 18 days ago

Discussion

ChuckMcSneed

18 days ago

At least have the decency to distill from good models, what you made is pure trash, waste of compute.

layer4down

14 days ago

At least have the decency to distill from good models, what you made is pure trash, waste of compute.

A distill that's 110B parameters larger?

ChuckMcSneed

13 days ago

At least have the decency to distill from good models, what you made is pure trash, waste of compute.

A distill that's 110B parameters larger?

Yes, they are this stupid. They distilled a 120B into 229B model.

sriting

MiniMax org 12 days ago

Thanks for the comment, but just to correct the misinformation:
If MiniMax M2 were truly “pure trash,” you’d see it reflected in the benchmarks, and you don’t.
We welcome tough feedback, but it needs to be factual if it’s going to be useful. If you have specific technical points, we’re always happy to dive deep.
We open-sourced M2 so that everyone can use it freely and evaluate it transparently.
And honestly, if M2 doesn’t work for your needs, you’re absolutely free to use any other model. 😊

sriting changed discussion status to closed 12 days ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment