| | --- |
| | language: en |
| | tags: |
| | - roberta-base |
| | - roberta-base-epoch_75 |
| | license: mit |
| | datasets: |
| | - wikipedia |
| | - bookcorpus |
| | --- |
| | |
| | # RoBERTa, Intermediate Checkpoint - Epoch 75 |
| |
|
| | This model is part of our reimplementation of the [RoBERTa model](https://arxiv.org/abs/1907.11692), |
| | trained on Wikipedia and the Book Corpus only. |
| | We train this model for almost 100K steps, corresponding to 83 epochs. |
| | We provide the 84 checkpoints (including the randomly initialized weights before the training) |
| | to provide the ability to study the training dynamics of such models, and other possible use-cases. |
| |
|
| | These models were trained in part of a work that studies how simple statistics from data, |
| | such as co-occurrences affects model predictions, which are described in the paper |
| | [Measuring Causal Effects of Data Statistics on Language Model's `Factual' Predictions](https://arxiv.org/abs/2207.14251). |
| |
|
| | This is RoBERTa-base epoch_75. |
| | |
| | ## Model Description |
| | |
| | This model was captured during a reproduction of |
| | [RoBERTa-base](https://huggingface.co/roberta-base), for English: it |
| | is a Transformers model pretrained on a large corpus of English data, using the |
| | Masked Language Modelling (MLM). |
| | |
| | The intended uses, limitations, training data and training procedure for the fully trained model are similar |
| | to [RoBERTa-base](https://huggingface.co/roberta-base). Two major |
| | differences with the original model: |
| | |
| | * We trained our model for 100K steps, instead of 500K |
| | * We only use Wikipedia and the Book Corpus, as corpora which are publicly available. |
| | |
| | |
| | ### How to use |
| | |
| | Using code from |
| | [RoBERTa-base](https://huggingface.co/roberta-base), here is an example based on |
| | PyTorch: |
| | |
| | ``` |
| | from transformers import pipeline |
| | |
| | model = pipeline("fill-mask", model='yanaiela/roberta-base-epoch_83', device=-1, top_k=10) |
| | model("Hello, I'm the <mask> RoBERTa-base language model") |
| | |
| | ``` |
| | |
| | ## Citation info |
| | |
| | ```bibtex |
| | @article{2207.14251, |
| | Author = {Yanai Elazar and Nora Kassner and Shauli Ravfogel and Amir Feder and Abhilasha Ravichander and Marius Mosbach and Yonatan Belinkov and Hinrich Schütze and Yoav Goldberg}, |
| | Title = {Measuring Causal Effects of Data Statistics on Language Model's `Factual' Predictions}, |
| | Year = {2022}, |
| | Eprint = {arXiv:2207.14251}, |
| | } |
| | ``` |
| | |