| | --- |
| | inference: false |
| | language: |
| | - bg |
| | license: mit |
| | datasets: |
| | - oscar |
| | - chitanka |
| | - wikipedia |
| | tags: |
| | - torch |
| | --- |
| | |
| | # BERT BASE (cased) finetuned on Bulgarian squad data |
| |
|
| | Pretrained model on Bulgarian language using a masked language modeling (MLM) objective. It was introduced in |
| | [this paper](https://arxiv.org/abs/1810.04805) and first released in |
| | [this repository](https://github.com/google-research/bert). This model is cased: it does make a difference |
| | between bulgarian and Bulgarian. The training data is Bulgarian text from [OSCAR](https://oscar-corpus.com/post/oscar-2019/), [Chitanka](https://chitanka.info/) and [Wikipedia](https://bg.wikipedia.org/). |
| |
|
| | It was finetuned on private squad Bulgarian data. |
| |
|
| | Then, it was compressed via [progressive module replacing](https://arxiv.org/abs/2002.02925). |
| |
|
| | ### How to use |
| |
|
| | Here is how to use this model in PyTorch: |
| |
|
| | ```python |
| | >>> from transformers import pipeline |
| | >>> |
| | >>> model = pipeline( |
| | >>> 'question-answering', |
| | >>> model='rmihaylov/bert-base-squad-theseus-bg', |
| | >>> tokenizer='rmihaylov/bert-base-squad-theseus-bg', |
| | >>> device=0, |
| | >>> revision=None) |
| | >>> |
| | >>> question = "С какво се проследява пандемията?" |
| | >>> context = "Епидемията гасне, обяви при обявяването на данните тази сутрин Тодор Кантарджиев, член на Националния оперативен щаб. Той направи този извод на база на данните от математическите модели, с които се проследява развитието на заразата. Те показват, че т. нар. ефективно репродуктивно число е вече в границите 0.6-1. Тоест, 10 души заразяват 8, те на свой ред 6 и така нататък. " |
| | |
| | >>> output = model(**{'question': question, 'context': context}) |
| | >>> print(output) |
| | |
| | {'score': 0.85157310962677, 'start': 162, 'end': 186, 'answer': ' математическите модели,'} |
| | ``` |
| |
|