--- license: apache-2.0 tags: - embedding - safetensors base_model: Qwen/Qwen3-Embedding-0.6B pipeline_tag: feature-extraction library_name: transformers --- # Qwen3-Embedding-0.6B Multi-format version of [Qwen/Qwen3-Embedding-0.6B](https://huggingface.co/Qwen/Qwen3-Embedding-0.6B) - optimized for deployment. ## Model Information | Property | Value | |----------|-------| | Base Model | [Qwen/Qwen3-Embedding-0.6B](https://huggingface.co/Qwen/Qwen3-Embedding-0.6B) | | Task | feature-extraction | | Type | Text Model | | Trust Remote Code | True | ## Available Versions | Folder | Format | Description | Size | |--------|--------|-------------|------| | `safetensors-fp32/` | PyTorch FP32 | Baseline, highest accuracy | 2288 MB | | `safetensors-fp16/` | PyTorch FP16 | GPU inference, ~50% smaller | 1152 MB | ## Usage ### PyTorch (GPU) ```python from transformers import AutoModel, AutoTokenizer import torch # GPU inference with FP16 model = AutoModel.from_pretrained( "n24q02m/Qwen3-Embedding-0.6B", subfolder="safetensors-fp16", torch_dtype=torch.float16, trust_remote_code=True ).cuda() tokenizer = AutoTokenizer.from_pretrained( "n24q02m/Qwen3-Embedding-0.6B", subfolder="safetensors-fp16", trust_remote_code=True ) # Inference inputs = tokenizer("Hello world", return_tensors="pt").to("cuda") with torch.no_grad(): outputs = model(**inputs) embeddings = outputs.last_hidden_state.mean(dim=1) # Mean pooling ``` ## Notes 1. **SafeTensors FP16** is the primary format for GPU inference 2. Load tokenizer from the same folder as the model ## License Apache 2.0 (following the base model's license) ## Credits - Base Model: [Qwen/Qwen3-Embedding-0.6B](https://huggingface.co/Qwen/Qwen3-Embedding-0.6B) - Conversion: PyTorch + SafeTensors