Automatically add EOS via Tokenizer, add Sentence Transformers snippet

#2
by tomaarsen HF Staff - opened

Hello!

Congratulations on the model releases!

Pull Request overview

  • Automatically add EOS token at the end of each tokenized input
  • Add Sentence Transformers snippets

Details

You can use the following snippets, note the 'revision' to load it directly from this PR:

With Sentence Transformers

To encode text using F2LLM with the Sentence Transformers library:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("codefuse-ai/F2LLM-0.6B", model_kwargs={"torch_dtype": "bfloat16"}, revision="refs/pr/2")

# Some sample query and documents
query = "What is F2LLM used for?"
documents = [
    'We present F2LLM, a family of fully open embedding LLMs that achieve a strong balance between model size, training data, and embedding performance.',
    'Model checkpoints, training datasets, and training code are released, positioning F2LLM as a strong, reproducible, and budget-friendly baseline for future research in text embedding models.',
    'F2LLM is a model for computing text embeddings that can be used for various NLP tasks such as information retrieval, semantic search, and text classification.'
]

# Encode the query and documents separately, the encode_query method uses the query prompt
query_embedding = model.encode_query(query)
document_embeddings = model.encode_document(documents)
print(query_embedding.shape, document_embeddings.shape)
# (1024,) (3, 1024)

# Compute cosine similarity between the query and documents
similarity = model.similarity(query_embedding, document_embeddings)
print(similarity)
# tensor([[0.5132, 0.5376, 0.8017]])

With Transformers

Or directly with the Transformers library:

from transformers import AutoModel, AutoTokenizer
import torch
import torch.nn.functional as F


model_path = "codefuse-ai/F2LLM-0.6B"
tokenizer = AutoTokenizer.from_pretrained(model_path, revision="refs/pr/2")
model = AutoModel.from_pretrained(model_path, torch_dtype=torch.bfloat16, device_map={'': 0}, revision="refs/pr/2")

query = "What is F2LLM used for?"
query_prompt = "Instruct: Given a web search query, retrieve relevant passages that answer the query\nQuery:"
documents = [
    'We present F2LLM, a family of fully open embedding LLMs that achieve a strong balance between model size, training data, and embedding performance.',
    'Model checkpoints, training datasets, and training code are released, positioning F2LLM as a strong, reproducible, and budget-friendly baseline for future research in text embedding models.',
    'F2LLM is a model for computing text embeddings that can be used for various NLP tasks such as information retrieval, semantic search, and text classification.'
]

def encode(sentences):
    batch_size = len(sentences)
    tokenized_inputs = tokenizer(sentences, padding=True, return_tensors='pt').to(model.device)
    last_hidden_state = model(**tokenized_inputs).last_hidden_state
    eos_positions = tokenized_inputs.attention_mask.sum(dim=1) - 1
    embeddings = last_hidden_state[torch.arange(batch_size, device=model.device), eos_positions]
    embeddings = F.normalize(embeddings, p=2, dim=1)
    return embeddings

# Encode the query and documents
query_embedding = encode([query_prompt + query])
document_embeddings = encode(documents)
print(query_embedding.shape, document_embeddings.shape)
# torch.Size([1, 1024]) torch.Size([3, 1024])

# Compute cosine similarity between the query and documents
similarity = query_embedding @ document_embeddings.T
print(similarity)
# tensor([[0.5039, 0.5312, 0.7930]], device='cuda:0', dtype=torch.bfloat16,
#        grad_fn=<MmBackward0>)

The change to the tokenizer means that the EOS is automatically included, simplifying both the transformers code and allowing for an integration with Sentence Transformers and related libraries. Note that there's a small difference in outputs between Sentence Transformers and pure Transformers: this is caused by bf16. If you use fp32, it should disappear. I wasn't able to remove this discrepancy after quite a while of trying. Either way, the results are close to fp32 for both Sentence Transformers and Transformers.

I also added snippets for usage with Sentence Transformers to the README.

cc @Geralt-Targaryen

  • Tom Aarsen
tomaarsen changed pull request status to open
Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment