| --- |
| license: apache-2.0 |
| language: |
| - en |
| pipeline_tag: text-classification |
| tags: |
| - url |
| - urls |
| - classification |
| new_version: CrabInHoney/urlbert-tiny-base-v4 |
| --- |
| This is a very small version of BERT, intended for later fine-tune under URL analysis. |
|
|
|
|
| An updated version of the old basic model for URL analysis |
|
|
| Old version: https://huggingface.co/CrabInHoney/urlbert-tiny-base-v2 |
|
|
| Model size |
|
|
| 3.69M params |
|
|
| Tensor type |
|
|
| F32 |
|
|
| Test example: |
|
|
| from transformers import BertTokenizerFast, BertForMaskedLM, pipeline |
| import torch |
| |
| device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') |
| print(f"Используемое устройство: {device}") |
| |
| model_name = "CrabInHoney/urlbert-tiny-base-v3" |
| |
| tokenizer = BertTokenizerFast.from_pretrained(model_name) |
| model = BertForMaskedLM.from_pretrained(model_name) |
| model.to(device) |
| |
| fill_mask = pipeline( |
| "fill-mask", |
| model=model, |
| tokenizer=tokenizer, |
| device=0 if torch.cuda.is_available() else -1 |
| ) |
| |
| sentences = [ |
| "http://example.[MASK]/" |
| ] |
| |
| for sentence in sentences: |
| print(f"\nИсходное предложение: {sentence}") |
| results = fill_mask(sentence) |
| for result in results: |
| token_str = result['token_str'] |
| score = result['score'] |
| print(f"Предсказанное слово: {token_str}, вероятность: {score:.4f}") |
| |
| |
| Output: |
|
|
| Исходное предложение: http://example.[MASK]/ |
|
|
| Предсказанное слово: com, вероятность: 0.7018 |
|
|
| Предсказанное слово: org, вероятность: 0.1191 |
|
|
| Предсказанное слово: nl, вероятность: 0.0406 |
|
|
| Предсказанное слово: net, вероятность: 0.0294 |
|
|
| Предсказанное слово: ca, вероятность: 0.0190 |