lol nice timing

by ubergarm - opened 18 days ago

18 days ago

Hey! I also just got the Q8_0 going myself, had to fuss with the template a little but it is at least running on mainline llama.cpp (i didn't release any quants yet, as I don't think ik has support for this exact version yet).

https://huggingface.co/ai-sage/GigaChat3-702B-A36B-preview-bf16/discussions/1

The smaller model isn't working, i spent much of today faffing around with it before giving the big one a shot: https://huggingface.co/ai-sage/GigaChat3-10B-A1.8B/discussions/1#691f55085f8746f8c9b8a24e Let me know if you have any luck with the smaller one!

csabakecskemeti

DevQuasar org 18 days ago

yep, just get the jinja fixed, that is also uploaded.
Q2 is meh, Q4 looks good

ubergarm

18 days ago

@csabakecskemeti

Nice! The smaller GigaChad3-10B-1.8A also has the template error. got it going after figuring out it is a deepseek lite variant and needs a small patch in ik/llama.cpp to detect it properly given it is only 26 layers instead of DeepSeek-V2-Lite's 27 layers (which is how it is detected hah)

I'll probably pass on big GigaChad-702B-A36B-preview for now as you have it covered and i'm still trying to figure out some tensor dimension size issue on ik's so can't do my usual treatment tonight.

Cheers!

csabakecskemeti

DevQuasar org 18 days ago

this has just worked with llama.cpp as "Deepseek V3"
The model differences were only number of layers ans such:

Attention heads → DeepSeek: 128, GigaChat: 64
KV heads → DeepSeek: 128, GigaChat: 64
Value head dim → DeepSeek: 128, GigaChat: 192
Layers → DeepSeek: 61, GigaChat: 64
Position embeddings → GigaChat slightly smaller (131k vs 163k)

If anyone experiences issue I hope they'll report it.

int13h82

17 days ago

Hmm... For me llama.cpp says:
general.architecture str = deepseek2
Still works.
And I can see the promised optimizations for Russian.
I have a long test prompt - DeepSeek converts it to 12K tokens, GigaChad can fit just to 8K.

ubergarm

17 days ago

@int13h82

Nice, I'd love to hear more your thoughts on its multi-language abilities!

general.architecture str = deepseek2

Yes, that is correct. Internally all of the deepseek variants fall into deepseek2 even the newer V3 etc.

And yes I've tested GigaChat3-702B-A36B-preview on mainline which did not need any special PR. The PR which is now merged was for the smaller GigaChat3-10B-A1.8B version only which is deepseek lite. Both models also work on ik_llama.cpp with this PR: https://github.com/ikawrakow/ik_llama.cpp/pull/995

Just make sure your chat template is properly fixed for both of the models.

Cheers!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment