Files changed (1) hide show
  1. README.md +6 -66
README.md CHANGED
@@ -7,7 +7,6 @@ tags:
7
  - sentence-transformers
8
  - sentence-similarity
9
  - feature-extraction
10
- - text-embeddings-inference
11
  ---
12
  # Qwen3-Embedding-0.6B
13
 
@@ -24,7 +23,6 @@ The Qwen3 Embedding model series is the latest proprietary model of the Qwen fam
24
  **Comprehensive Flexibility**: The Qwen3 Embedding series offers a full spectrum of sizes (from 0.6B to 8B) for both embedding and reranking models, catering to diverse use cases that prioritize efficiency and effectiveness. Developers can seamlessly combine these two modules. Additionally, the embedding model allows for flexible vector definitions across all dimensions, and both embedding and reranking models support user-defined instructions to enhance performance for specific tasks, languages, or scenarios.
25
 
26
  **Multilingual Capability**: The Qwen3 Embedding series offer support for over 100 languages, thanks to the multilingual capabilites of Qwen3 models. This includes various programming languages, and provides robust multilingual, cross-lingual, and code retrieval capabilities.
27
-
28
  ## Model Overview
29
 
30
  **Qwen3-Embedding-0.6B** has the following features:
@@ -64,7 +62,6 @@ KeyError: 'qwen3'
64
 
65
  ```python
66
  # Requires transformers>=4.51.0
67
- # Requires sentence-transformers>=2.7.0
68
 
69
  from sentence_transformers import SentenceTransformer
70
 
@@ -168,66 +165,8 @@ scores = (embeddings[:2] @ embeddings[2:].T)
168
  print(scores.tolist())
169
  # [[0.7645568251609802, 0.14142508804798126], [0.13549736142158508, 0.5999549627304077]]
170
  ```
171
-
172
- ### vLLM Usage
173
-
174
- ```python
175
- # Requires vllm>=0.8.5
176
- import torch
177
- import vllm
178
- from vllm import LLM
179
-
180
- def get_detailed_instruct(task_description: str, query: str) -> str:
181
- return f'Instruct: {task_description}\nQuery:{query}'
182
-
183
- # Each query must come with a one-sentence instruction that describes the task
184
- task = 'Given a web search query, retrieve relevant passages that answer the query'
185
-
186
- queries = [
187
- get_detailed_instruct(task, 'What is the capital of China?'),
188
- get_detailed_instruct(task, 'Explain gravity')
189
- ]
190
- # No need to add instruction for retrieval documents
191
- documents = [
192
- "The capital of China is Beijing.",
193
- "Gravity is a force that attracts two bodies towards each other. It gives weight to physical objects and is responsible for the movement of planets around the sun."
194
- ]
195
- input_texts = queries + documents
196
-
197
- model = LLM(model="Qwen/Qwen3-Embedding-0.6B", task="embed")
198
-
199
- outputs = model.embed(input_texts)
200
- embeddings = torch.tensor([o.outputs.embedding for o in outputs])
201
- scores = (embeddings[:2] @ embeddings[2:].T)
202
- print(scores.tolist())
203
- # [[0.7620252966880798, 0.14078938961029053], [0.1358368694782257, 0.6013815999031067]]
204
- ```
205
-
206
  📌 **Tip**: We recommend that developers customize the `instruct` according to their specific scenarios, tasks, and languages. Our tests have shown that in most retrieval scenarios, not using an `instruct` on the query side can lead to a drop in retrieval performance by approximately 1% to 5%.
207
 
208
- ### Text Embeddings Inference (TEI) Usage
209
-
210
- You can either run / deploy TEI on NVIDIA GPUs as:
211
-
212
- ```bash
213
- docker run --gpus all -p 8080:80 -v hf_cache:/data --pull always ghcr.io/huggingface/text-embeddings-inference:cpu-1.7.2 --model-id Qwen/Qwen3-Embedding-0.6B --dtype float16
214
- ```
215
-
216
- Or on CPU devices as:
217
-
218
- ```bash
219
- docker run -p 8080:80 -v hf_cache:/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.7.2 --model-id Qwen/Qwen3-Embedding-0.6B
220
- ```
221
-
222
- And then, generate the embeddings sending a HTTP POST request as:
223
-
224
- ```bash
225
- curl http://localhost:8080/embed \
226
- -X POST \
227
- -d '{"inputs": ["Instruct: Given a web search query, retrieve relevant passages that answer the query\nQuery: What is the capital of China?", "Instruct: Given a web search query, retrieve relevant passages that answer the query\nQuery: Explain gravity"]}' \
228
- -H "Content-Type: application/json"
229
- ```
230
-
231
  ## Evaluation
232
 
233
  ### MTEB (Multilingual)
@@ -283,10 +222,11 @@ curl http://localhost:8080/embed \
283
  If you find our work helpful, feel free to give us a cite.
284
 
285
  ```
286
- @article{qwen3embedding,
287
- title={Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models},
288
- author={Zhang, Yanzhao and Li, Mingxin and Long, Dingkun and Zhang, Xin and Lin, Huan and Yang, Baosong and Xie, Pengjun and Yang, An and Liu, Dayiheng and Lin, Junyang and Huang, Fei and Zhou, Jingren},
289
- journal={arXiv preprint arXiv:2506.05176},
290
- year={2025}
 
291
  }
292
  ```
 
7
  - sentence-transformers
8
  - sentence-similarity
9
  - feature-extraction
 
10
  ---
11
  # Qwen3-Embedding-0.6B
12
 
 
23
  **Comprehensive Flexibility**: The Qwen3 Embedding series offers a full spectrum of sizes (from 0.6B to 8B) for both embedding and reranking models, catering to diverse use cases that prioritize efficiency and effectiveness. Developers can seamlessly combine these two modules. Additionally, the embedding model allows for flexible vector definitions across all dimensions, and both embedding and reranking models support user-defined instructions to enhance performance for specific tasks, languages, or scenarios.
24
 
25
  **Multilingual Capability**: The Qwen3 Embedding series offer support for over 100 languages, thanks to the multilingual capabilites of Qwen3 models. This includes various programming languages, and provides robust multilingual, cross-lingual, and code retrieval capabilities.
 
26
  ## Model Overview
27
 
28
  **Qwen3-Embedding-0.6B** has the following features:
 
62
 
63
  ```python
64
  # Requires transformers>=4.51.0
 
65
 
66
  from sentence_transformers import SentenceTransformer
67
 
 
165
  print(scores.tolist())
166
  # [[0.7645568251609802, 0.14142508804798126], [0.13549736142158508, 0.5999549627304077]]
167
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
168
  📌 **Tip**: We recommend that developers customize the `instruct` according to their specific scenarios, tasks, and languages. Our tests have shown that in most retrieval scenarios, not using an `instruct` on the query side can lead to a drop in retrieval performance by approximately 1% to 5%.
169
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
170
  ## Evaluation
171
 
172
  ### MTEB (Multilingual)
 
222
  If you find our work helpful, feel free to give us a cite.
223
 
224
  ```
225
+ @misc{qwen3-embedding,
226
+ title = {Qwen3-Embedding},
227
+ url = {https://qwenlm.github.io/blog/qwen3/},
228
+ author = {Qwen Team},
229
+ month = {May},
230
+ year = {2025}
231
  }
232
  ```