SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2
This is a sentence-transformers model finetuned from sentence-transformers/all-MiniLM-L6-v2 on the prep-manga-recom dataset. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: sentence-transformers/all-MiniLM-L6-v2
- Maximum Sequence Length: 256 tokens
- Output Dimensionality: 384 dimensions
- Similarity Function: Cosine Similarity
- Training Dataset:
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 256, 'do_lower_case': False, 'architecture': 'BertModel'})
(1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("tonnnnnnnnnnnnn/semantic_text_sim-v1")
# Run inference
sentences = [
"Yamato Akitsuki has recently moved to Tokyo, and in order to maintain his less than wealthy lifestyle, he currently works at his aunt's bathhouse. While walking by the school one evening he sees a girl by the name of Suzuka practicing the high jump, and is instantly in love. Even better is the realization that Suzuka lives next door! Determined to prove himself worthy of her affections, Yamato decides to join the school's track team and show her what he's got, but things won't be so easy; for Suzuka has a love interest of her own, and it isn't Yamato...",
'What is the name of the manga where the protagonist is involved in a car crash and wakes up in life after death?',
'What is the name of the manga with the description "As Japan and the rest of the world begins the process of rebuilding after the fall of \'Friend\', Kenji and his friends must try to uncover the identity of the second \'Friend\' and other unresolved mysteries. Before the world is once again thrown into turmoil, they must search deep into their childhood memories to find the key to save the world one more time from the threat of \'Friend\'; some mysteries cannot be left unsolved."',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[ 1.0000, -0.0386, 0.1505],
# [-0.0386, 1.0000, 0.1664],
# [ 0.1505, 0.1664, 1.0000]])
Evaluation
Metrics
Semantic Similarity
- Datasets:
manga-devandmanga-test - Evaluated with
EmbeddingSimilarityEvaluator
| Metric | manga-dev | manga-test |
|---|---|---|
| pearson_cosine | 0.7604 | 0.7596 |
| spearman_cosine | 0.7134 | 0.7161 |
Training Details
Training Dataset
prep-manga-recom
- Dataset: prep-manga-recom at 99869dd
- Size: 8,100 training samples
- Columns:
sentence1,sentence2, andscore - Approximate statistics based on the first 1000 samples:
sentence1 sentence2 score type string string float details - min: 7 tokens
- mean: 98.51 tokens
- max: 256 tokens
- min: 7 tokens
- mean: 34.25 tokens
- max: 228 tokens
- min: -0.2
- mean: 0.09
- max: 0.64
- Samples:
sentence1 sentence2 score A girl struggles to survive in a zombie apocalypse and ends up finding herself a boyfriend! But there is a problem...he is undead!What is the name of a manga about a man with no money and no place to belong who tries to end his life, however is unable to take the last step?0.2546822726726532The second season of Terror Man.What is the name of a manga about a 30-year-old man who meets a stoic graphic designer and falls in love?0.0785623267292976After the loss of his loyal dog, Winter, Ji Seungwoo is wounded by an irreplaceable loss and sadness of his 22 year-long companionship. But one day he finds an adorable little kitten waiting in a cardboard box next to his front door. Being the gentle and kind-hearted man he was, he decides to bring him home. However, upon waking up the next morning he is no longer in bed with a cute baby kitten, but a grown, handsome man?! “I am this person’s owner!” he says! Stay tuned to watch how Seungwoo and Mr. Cat’s relationship develops…!What is the name of the manga about a fraudulent exorcist and his student assistant?-0.021251555532217 - Loss:
CosineSimilarityLosswith these parameters:{ "loss_fct": "torch.nn.modules.loss.MSELoss" }
Evaluation Dataset
prep-manga-recom
- Dataset: prep-manga-recom at 99869dd
- Size: 900 evaluation samples
- Columns:
sentence1,sentence2, andscore - Approximate statistics based on the first 900 samples:
sentence1 sentence2 score type string string float details - min: 7 tokens
- mean: 98.39 tokens
- max: 233 tokens
- min: 7 tokens
- mean: 34.49 tokens
- max: 234 tokens
- min: -0.22
- mean: 0.09
- max: 0.69
- Samples:
sentence1 sentence2 score Shuichiro used to secretly go to the terrace of his school, during lunch, when skipping classes and when he didn't want to go home. There he knew Konno. Some say he was difficult to talk, that he is intimidating and scary... but there is who say he is a camera fanatic and an unexpectedly nice guy. Respect, envy, jealousy and inferiority complex will mark their friendship. A masked kid who is hiding behind a fake smile all his tormented thought and a simple-hearted friend who start to mobilize all the conflicts without realizing.What is Midori’s relationship with Hinata?0.1419442445039749Baki Hanma is a generally happy student with a rather odd hobby; he likes fighting. Specifically, he likes fighting in a secret martial arts tournament that gathers the greatest fighters of the world and pits them against each other in really nasty combat. New Grappler Baki takes off where the original series leaves off. As the new Tournament champion, he's generally taken it easy until he recieves the news of five deadly, murderous, martial artist who have escaped their prisons and are now headed to Japan. Baki and his friends must deal with them before they are killed themselves.What is the name of a manga about a girl who must face her past to get revenge?0.1093111932277679“Hey, don’t get confused. I’ve never thought of you as my little brother. You don’t even know your right place.” That was the dagger that hurt the most. Han Myoung Woo. A child of a wealthy family with a quick mind. But god didn’t give him everything. A near incurable genetic heart disease. He did his best in order to earn the respect of the people around him. But his father, the chairman, and his family members never took him seriously. That was when an accident suddenly struck him. And… He woke up in the body of the severely injured student, Kim Cheol Min.What is a manga about a few thousand years has passed since an alchemist created Winter. He is now living with Jane learning what it means to be alive as a human.-0.086394652724266 - Loss:
CosineSimilarityLosswith these parameters:{ "loss_fct": "torch.nn.modules.loss.MSELoss" }
Training Hyperparameters
Non-Default Hyperparameters
eval_strategy: stepsper_device_train_batch_size: 64per_device_eval_batch_size: 64num_train_epochs: 4warmup_ratio: 0.1fp16: True
All Hyperparameters
Click to expand
overwrite_output_dir: Falsedo_predict: Falseeval_strategy: stepsprediction_loss_only: Trueper_device_train_batch_size: 64per_device_eval_batch_size: 64per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 5e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1.0num_train_epochs: 4max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.1warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falsebf16: Falsefp16: Truefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}parallelism_config: Nonedeepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torchoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthproject: huggingfacetrackio_space_id: trackioddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Nonehub_always_push: Falsehub_revision: Nonegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseinclude_for_metrics: []eval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters:auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: noneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseliger_kernel_config: Noneeval_use_gather_object: Falseaverage_tokens_across_devices: Trueprompts: Nonebatch_sampler: batch_samplermulti_dataset_batch_sampler: proportionalrouter_mapping: {}learning_rate_mapping: {}
Training Logs
| Epoch | Step | Training Loss | Validation Loss | manga-dev_spearman_cosine | manga-test_spearman_cosine |
|---|---|---|---|---|---|
| 0.7874 | 100 | 0.0107 | 0.0078 | 0.6415 | - |
| 1.5748 | 200 | 0.006 | 0.0067 | 0.6889 | - |
| 2.3622 | 300 | 0.0041 | 0.0064 | 0.7014 | - |
| 3.1496 | 400 | 0.0031 | 0.0061 | 0.7140 | - |
| 3.9370 | 500 | 0.0024 | 0.0061 | 0.7134 | - |
| -1 | -1 | - | - | - | 0.7161 |
Framework Versions
- Python: 3.10.0
- Sentence Transformers: 5.1.2
- Transformers: 4.57.1
- PyTorch: 2.5.1+cu121
- Accelerate: 1.11.0
- Datasets: 4.4.1
- Tokenizers: 0.22.1
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
- Downloads last month
- 5
Model tree for tonnnnnnnnnnnnn/semantic_text_sim-v1
Dataset used to train tonnnnnnnnnnnnn/semantic_text_sim-v1
Evaluation results
- Pearson Cosine on manga devself-reported0.760
- Spearman Cosine on manga devself-reported0.713
- Pearson Cosine on manga testself-reported0.760
- Spearman Cosine on manga testself-reported0.716