---
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:47610
- loss:MultipleNegativesRankingLoss
widget:
- source_sentence: '[MENTION] Gustavus And Louise Pfeiffer Research Foundation [CITY]
Bangor [COUNTRY] United States'
sentences:
- '[MENTION] Gustavus And Louise Pfeiffer Research Foundation [CITY] Bangor [COUNTRY]
United States'
- '[MENTION] Fifth Tianjin Central Hospital [CITY] Tianjin [COUNTRY] China'
- '[MENTION] Purdue Research Foundation [ACRONYM] PRF [CITY] West Lafayette [COUNTRY]
United States'
- source_sentence: '[MENTION] ইন্টার-ইউরিভার্সিটি সেন্টার ফর অ্যাস্ট্রোনমি অ্যান্ড
অ্যাস্ট্রোফিজিক্স [CITY] Pune [COUNTRY] India'
sentences:
- '[MENTION] National Centre for Radio Astrophysics [ACRONYM] NCRA TIFR [PARENT]
Tata Institute of Fundamental Research [ACRONYM] TIFR [CITY] Pune [COUNTRY] India'
- '[MENTION] Inter-University Centre for Astronomy and Astrophysics [ACRONYM] IUCAA
[CITY] Pune [COUNTRY] India'
- '[MENTION] Iskra Medical (Slovenia) [CITY] Radovljica [COUNTRY] Slovenia'
- source_sentence: '[MENTION] Raytheon Technologies (Canada) [CITY] Calgary [COUNTRY]
Canada'
sentences:
- '[MENTION] Raytheon Technologies (Canada) [ACRONYM] RCL [PARENT] RTX (United States)
[CITY] Calgary [COUNTRY] Canada'
- '[MENTION] Yunnan Open University [CITY] Kunming [COUNTRY] China'
- '[MENTION] ATCO (Canada) [CITY] Calgary [COUNTRY] Canada'
- source_sentence: '[MENTION] 유한양행 [CITY] Seoul'
sentences:
- '[MENTION] Instituto de Medicina Molecular João Lobo Antunes [ACRONYM] IMM [PARENT]
University of Lisbon [CITY] Lisbon [COUNTRY] Portugal'
- '[MENTION] Boehringer Ingelheim (South Korea) [PARENT] Boehringer Ingelheim (Germany)
[CITY] Seoul [COUNTRY] South Korea'
- '[MENTION] Yuhan (South Korea) [CITY] Seoul [COUNTRY] South Korea'
- source_sentence: '[MENTION] Hyderabad Cleft Society [COUNTRY] India'
sentences:
- '[MENTION] Hyderabad Cleft Society [ACRONYM] HCS [CITY] Hyderabad [COUNTRY] India'
- '[MENTION] Hyderabad Rheumatology Center [ACRONYM] HRC [CITY] Hyderabad [COUNTRY]
India'
- '[MENTION] National Institute of Technology Akita College [PARENT] National Institute
of Technology [CITY] Akita [COUNTRY] Japan'
pipeline_tag: sentence-similarity
library_name: sentence-transformers
metrics:
- pearson_cosine
- spearman_cosine
model-index:
- name: SentenceTransformer
results:
- task:
type: semantic-similarity
name: Semantic Similarity
dataset:
name: entity linking eval
type: entity_linking_eval
metrics:
- type: pearson_cosine
value: 0.7072780089709011
name: Pearson Cosine
- type: spearman_cosine
value: 0.6825742231480432
name: Spearman Cosine
---
# SentenceTransformer
This is a [sentence-transformers](https://www.SBERT.net) model trained. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
## Model Details
### Model Description
- **Model Type:** Sentence Transformer
- **Maximum Sequence Length:** 128 tokens
- **Output Dimensionality:** 1024 dimensions
- **Similarity Function:** Cosine Similarity
### Model Sources
- **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
- **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
### Full Model Architecture
```
SentenceTransformer(
(0): Transformer({'max_seq_length': 128, 'do_lower_case': False}) with Transformer model: XLMRobertaModel
(1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
```
## Usage
### Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
```bash
pip install -U sentence-transformers
```
Then you can load this model and run inference.
```python
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("SIRIS-Lab/affilgood-dense-retriever")
# Run inference
sentences = [
'[MENTION] Hyderabad Cleft Society [COUNTRY] India',
'[MENTION] Hyderabad Cleft Society [ACRONYM] HCS [CITY] Hyderabad [COUNTRY] India',
'[MENTION] Hyderabad Rheumatology Center [ACRONYM] HRC [CITY] Hyderabad [COUNTRY] India',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
```
## Evaluation
### Metrics
#### Semantic Similarity
* Dataset: `entity_linking_eval`
* Evaluated with [EmbeddingSimilarityEvaluator](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator)
| Metric | Value |
|:--------------------|:-----------|
| pearson_cosine | 0.7073 |
| **spearman_cosine** | **0.6826** |
## Training Details
### Training Dataset
#### Unnamed Dataset
* Size: 47,610 training samples
* Columns: sentence_0, sentence_1, and sentence_2
* Approximate statistics based on the first 1000 samples:
| | sentence_0 | sentence_1 | sentence_2 |
|:--------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|
| type | string | string | string |
| details |
[MENTION] The Prince Of Wales'S Institute Of Architecture [CITY] London [COUNTRY] United Kingdom | [MENTION] The Princes Foundation [CITY] London [COUNTRY] United Kingdom | [MENTION] Royal Institute of British Architects [ACRONYM] RIBA [CITY] London [COUNTRY] United Kingdom |
| [MENTION] Development Finance & Public Policies [COUNTRY] Belgium | [MENTION] Development Finance and Public Policies [ACRONYM] DEFIPP [PARENT] University of Namur [CITY] Namur [COUNTRY] Belgium | [MENTION] Service Public Federal Finances [ACRONYM] SPF [CITY] Brussels [COUNTRY] Belgium |
| [MENTION] EES [COUNTRY] United States | [MENTION] Emerald Education Systems [ACRONYM] EES [CITY] Pasadena [COUNTRY] United States | [MENTION] ESI Group (United States) [ACRONYM] ESI [PARENT] ESI Group (France) [ACRONYM] ESI [CITY] Farmington Hills [COUNTRY] United States |
* Loss: [MultipleNegativesRankingLoss](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
```json
{
"scale": 20.0,
"similarity_fct": "cos_sim"
}
```
### Training Hyperparameters
#### Non-Default Hyperparameters
- `eval_strategy`: steps
- `per_device_train_batch_size`: 16
- `per_device_eval_batch_size`: 16
- `fp16`: True
- `multi_dataset_batch_sampler`: round_robin
#### All Hyperparameters