How to use from the
Use from the
sentence-transformers library
from sentence_transformers import SentenceTransformer

model = SentenceTransformer("zacbrld/MNLP_M2_document_encoder")

sentences = [
    "Military activity affects the physical geology. This was first noted through the intensive shelling on the Western Front during World War I, which caused the shattering of the bedrock and changed the rocks' permeability. New minerals, rocks, and land-forms are also a byproduct of nuclear testing.",
    "Silicon can form sigma bonds to other silicon atoms (and disilane is the parent of this class of compounds). However, it is difficult to prepare and isolate SinH2n+2 (analogous to the saturated alkane hydrocarbons) with n greater than about 8, as their thermal stability decreases with increases in the number of silicon atoms.  Silanes higher in molecular weight than disilane decompose to polymeric polysilicon hydride and hydrogen.  But with a suitable pair of organic substituents in place of hydrogen on each silicon it is possible to prepare polysilanes (sometimes, erroneously called polysilenes) that are analogues of alkanes. These long chain compounds have surprising electronic properties - high electrical conductivity, for example - arising from sigma delocalization of the electrons in the chain.\nEven silicon–silicon pi bonds are possible. However, these bonds are less stable than the carbon analogues. Disilane and longer silanes are quite reactive compared to alkanes. Disilene and disilynes are quite rare, unlike alkenes and alkynes. Examples of disilynes, long thought to be too unstable to be isolated were reported in 2004.",
    "The increasing sophistication of brain-reading technologies has led many to investigate their potential applications for lie detection. Legally required brain scans arguably violate “the guarantee against self-incrimination” because they differ from acceptable forms of bodily evidence, such as fingerprints or blood samples, in an important way: they are not simply physical, hard evidence, but evidence that is intimately linked to the defendant's mind. Under US law, brain-scanning technologies might also raise implications for the Fourth Amendment, calling into question whether they constitute an unreasonable search and seizure.",
    "Military activity affects the physical geology. This was first noted through the intensive shelling on the Western Front during World War I, which caused the shattering of the bedrock and changed the rocks' permeability. New minerals, rocks, and land-forms are also a byproduct of nuclear testing."
]
embeddings = model.encode(sentences)

similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [4, 4]

SentenceTransformer based on zacbrld/MNLP_M2_document_encoder

This is a sentence-transformers model finetuned from zacbrld/MNLP_M2_document_encoder. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: zacbrld/MNLP_M2_document_encoder
  • Maximum Sequence Length: 256 tokens
  • Output Dimensionality: 384 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("zacbrld/MNLP_M2_document_encoder")
# Run inference
sentences = [
    'Burray and The Barriers\nUndiscovered Scotland: The Churchill Barriers\nOur Past History: The Churchill Barriers Archived 17 December 2006 at the Wayback Machine\nOkneypics.com: photos of the barrier Archived 15 May 2008 at the Wayback Machine',
    'Burray and The Barriers\nUndiscovered Scotland: The Churchill Barriers\nOur Past History: The Churchill Barriers Archived 17 December 2006 at the Wayback Machine\nOkneypics.com: photos of the barrier Archived 15 May 2008 at the Wayback Machine',
    'Cerebellar Purkinje neurons have been proposed to have two distinct bursting modes: dendritically driven, by dendritic Ca2+ spikes, and somatically driven, wherein the persistent Na+ current is the burst initiator and the SK K+ current is the burst terminator. Purkinje neurons may utilise these bursting forms in information coding to the deep cerebellar nuclei.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

Unnamed Dataset

  • Size: 5,489 training samples
  • Columns: sentence_0 and sentence_1
  • Approximate statistics based on the first 1000 samples:
    sentence_0 sentence_1
    type string string
    details
    • min: 34 tokens
    • mean: 144.23 tokens
    • max: 256 tokens
    • min: 34 tokens
    • mean: 144.23 tokens
    • max: 256 tokens
  • Samples:
    sentence_0 sentence_1
    In related work, Smoller, Temple, and Vogler propose that this shockwave may have resulted in our part of the universe having a lower density than that surrounding it, causing the accelerated expansion normally attributed to dark energy.
    They also propose that this related theory could be tested: a universe with dark energy should give a figure for the cubic correction to redshift versus luminosity C = −0.180 at a = a whereas for Smoller, Temple, and Vogler's alternative C should be positive rather than negative. They give a more precise calculation for their wave model alternative as: the cubic correction to redshift versus luminosity at a = a is C = 0.359.
    In related work, Smoller, Temple, and Vogler propose that this shockwave may have resulted in our part of the universe having a lower density than that surrounding it, causing the accelerated expansion normally attributed to dark energy.
    They also propose that this related theory could be tested: a universe with dark energy should give a figure for the cubic correction to redshift versus luminosity C = −0.180 at a = a whereas for Smoller, Temple, and Vogler's alternative C should be positive rather than negative. They give a more precise calculation for their wave model alternative as: the cubic correction to redshift versus luminosity at a = a is C = 0.359.
    Evolution is a central organizing concept in biology. It is the change in heritable characteristics of populations over successive generations. In artificial selection, animals were selectively bred for specific traits.
    Given that traits are inherited, populations contain a varied mix of traits, and reproduction is able to increase any population, Darwin argued that in the natural world, it was nature that played the role of humans in selecting for specific traits. Darwin inferred that individuals who possessed heritable traits better adapted to their environments are more likely to survive and produce more offspring than other individuals. He further inferred that this would lead to the accumulation of favorable traits over successive generations, thereby increasing the match between the organisms and their environment.
    Evolution is a central organizing concept in biology. It is the change in heritable characteristics of populations over successive generations. In artificial selection, animals were selectively bred for specific traits.
    Given that traits are inherited, populations contain a varied mix of traits, and reproduction is able to increase any population, Darwin argued that in the natural world, it was nature that played the role of humans in selecting for specific traits. Darwin inferred that individuals who possessed heritable traits better adapted to their environments are more likely to survive and produce more offspring than other individuals. He further inferred that this would lead to the accumulation of favorable traits over successive generations, thereby increasing the match between the organisms and their environment.
    The total number of engineers employed in the U.S. in 2015 was roughly 1.6 million. Of these, 278,340 were mechanical engineers (17.28%), the largest discipline by size. In 2012, the median annual income of mechanical engineers in the U.S. workforce was $80,580. The median income was highest when working for the government ($92,030), and lowest in education ($57,090). In 2014, the total number of mechanical engineering jobs was projected to grow 5% over the next decade. As of 2009, the average starting salary was $58,800 with a bachelor's degree. The total number of engineers employed in the U.S. in 2015 was roughly 1.6 million. Of these, 278,340 were mechanical engineers (17.28%), the largest discipline by size. In 2012, the median annual income of mechanical engineers in the U.S. workforce was $80,580. The median income was highest when working for the government ($92,030), and lowest in education ($57,090). In 2014, the total number of mechanical engineering jobs was projected to grow 5% over the next decade. As of 2009, the average starting salary was $58,800 with a bachelor's degree.
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • num_train_epochs: 5
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 5
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • tp_size: 0
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: round_robin

Training Logs

Epoch Step Training Loss
1.4535 500 0.0002
2.9070 1000 0.0
4.3605 1500 0.0007

Framework Versions

  • Python: 3.10.11
  • Sentence Transformers: 3.4.1
  • Transformers: 4.51.3
  • PyTorch: 2.6.0
  • Accelerate: 1.7.0
  • Datasets: 3.6.0
  • Tokenizers: 0.21.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
1
Safetensors
Model size
22.7M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for zacbrld/MNLP_M2_document_encoder

Unable to build the model tree, the base model loops to the model itself. Learn more.

Papers for zacbrld/MNLP_M2_document_encoder