Feature Extraction
sentence-transformers
Safetensors
Transformers
English
mistral
mteb
Eval Results (legacy)
text-embeddings-inference
Instructions to use Salesforce/SFR-Embedding-Mistral with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use Salesforce/SFR-Embedding-Mistral with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("Salesforce/SFR-Embedding-Mistral") sentences = [ "The weather is lovely today.", "It's so sunny outside!", "He drove to the stadium." ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3] - Transformers
How to use Salesforce/SFR-Embedding-Mistral with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("feature-extraction", model="Salesforce/SFR-Embedding-Mistral")# Load model directly from transformers import AutoTokenizer, AutoModel tokenizer = AutoTokenizer.from_pretrained("Salesforce/SFR-Embedding-Mistral") model = AutoModel.from_pretrained("Salesforce/SFR-Embedding-Mistral") - Notebooks
- Google Colab
- Kaggle
How much GPU memory is required for 32k context embedding?
#13
by Labmem009 - opened
I tried to use this model to get embedding of long text, but I failed many times with 6*A100 and DP for OOM. Is there any suggestion to allocate memory for long text?
Try:
with torch.no_grad():
outputs = model(**tokens)
I can do 4K tokens with room to spare on 2x 16GB GPUs and fp16
Is there any way to do this while using sentence-transformers? Every time I try to load it, it tries to allocate 96GB of VRAM.
embedding = HuggingFaceEmbeddings(model_name='Salesforce/SFR-Embedding-Mistral', model_kwargs={'device':f"cuda:{device_num}"})