lol nice timing
Hey! I also just got the Q8_0 going myself, had to fuss with the template a little but it is at least running on mainline llama.cpp (i didn't release any quants yet, as I don't think ik has support for this exact version yet).
https://huggingface.co/ai-sage/GigaChat3-702B-A36B-preview-bf16/discussions/1
The smaller model isn't working, i spent much of today faffing around with it before giving the big one a shot: https://huggingface.co/ai-sage/GigaChat3-10B-A1.8B/discussions/1#691f55085f8746f8c9b8a24e Let me know if you have any luck with the smaller one!
yep, just get the jinja fixed, that is also uploaded.
Q2 is meh, Q4 looks good
Nice! The smaller GigaChad3-10B-1.8A also has the template error. got it going after figuring out it is a deepseek lite variant and needs a small patch in ik/llama.cpp to detect it properly given it is only 26 layers instead of DeepSeek-V2-Lite's 27 layers (which is how it is detected hah)
I'll probably pass on big GigaChad-702B-A36B-preview for now as you have it covered and i'm still trying to figure out some tensor dimension size issue on ik's so can't do my usual treatment tonight.
Cheers!
this has just worked with llama.cpp as "Deepseek V3"
The model differences were only number of layers ans such:
Attention heads β DeepSeek: 128, GigaChat: 64
KV heads β DeepSeek: 128, GigaChat: 64
Value head dim β DeepSeek: 128, GigaChat: 192
Layers β DeepSeek: 61, GigaChat: 64
Position embeddings β GigaChat slightly smaller (131k vs 163k)
If anyone experiences issue I hope they'll report it.
Hmm... For me llama.cpp says:
general.architecture str = deepseek2
Still works.
And I can see the promised optimizations for Russian.
I have a long test prompt - DeepSeek converts it to 12K tokens, GigaChad can fit just to 8K.
Nice, I'd love to hear more your thoughts on its multi-language abilities!
general.architecture str = deepseek2
Yes, that is correct. Internally all of the deepseek variants fall into deepseek2 even the newer V3 etc.
And yes I've tested GigaChat3-702B-A36B-preview on mainline which did not need any special PR. The PR which is now merged was for the smaller GigaChat3-10B-A1.8B version only which is deepseek lite. Both models also work on ik_llama.cpp with this PR: https://github.com/ikawrakow/ik_llama.cpp/pull/995
Just make sure your chat template is properly fixed for both of the models.
Cheers!