Question about prompts
Hello!
Sentence Transformers maintainer here. I'm very excited about this model!
I had a question about the prompts, specifically because the section here (https://huggingface.co/tencent/KaLM-Embedding-Gemma3-12B-2511#sentence-transformers-support) states that "Instruct: Given a query, retrieve documents that answer the query \n Query: " will be used with model.encode_query automatically, but that method will use the "query" prompt here: https://huggingface.co/tencent/KaLM-Embedding-Gemma3-12B-2511/blob/main/config_sentence_transformers.json#L7
But there's no query prompt defined like in https://huggingface.co/KaLM-Embedding/KaLM-embedding-multilingual-mini-instruct-v2.5/blob/main/config_sentence_transformers.json#L8.
Should the prompts be the same as with https://huggingface.co/KaLM-Embedding/KaLM-embedding-multilingual-mini-instruct-v2.5/blob/main/config_sentence_transformers.json#L8? I understand that different prompts are used for different tasks (as can be seen here: https://huggingface.co/tencent/KaLM-Embedding-Gemma3-12B-2511/blob/main/task_prompts.json), but I'm not fully sure on the format yet. For example, the README also says "Instruct: Classifying the category of french news.\nQuery:", i.e. without any spaces before and after \n and after Query:.
What should the correct "default prompt" be? Then we can add it to https://huggingface.co/tencent/KaLM-Embedding-Gemma3-12B-2511/blob/main/config_sentence_transformers.json nicely.
cc @YanshekWoo
- Tom Aarsen
Thank you for your question.
The phrase "Instruct: Classifying the category of French news.\nQuery:" represents the default prompt format (without any spaces before and after \n and after Query:).
However, a single space typically does not significantly impact embedding performance.
Additionally, the content at https://huggingface.co/tencent/KaLM-Embedding-Gemma3-12B-2511/blob/main/task_prompts.json pertains only to the instruction section.
We will subsequently revise and upload the content, updating it to the complete prompt format to prevent ambiguity.