crossroderick
/

dalat5

Text Generation

text2text-generation

transliteration

Model card Files Files and versions

crossroderick commited on Apr 26

Commit

dcddc1a

·

1 Parent(s): d74a002

Updated info on CC100

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -101,7 +101,7 @@ print(output)
 Тәуелсіз жоба болғанына қарамастан, DalaT5 өте маңызды үш деректер жиынтығын пайдаланады / Despite being an independent project, DalaT5 makes use of three very important datasets:
-- The first ~1.5 million records of the Kazakh subset of the CC100 dataset by [Conneau et al. (2020)](https://paperswithcode.com/paper/unsupervised-cross-lingual-representation-1)
 - The raw, Kazakh-focused part of the [Kazakh Parallel Corpus (KazParC)](https://huggingface.co/datasets/issai/kazparc) from Nazarbayev University's Institute of Smart Systems and Artificial Intelligence (ISSAI), graciously made available on Hugging Face
 - The Wikipedia dump of articles in the Kazakh language, obtained via the `wikiextractor` Python package

 Тәуелсіз жоба болғанына қарамастан, DalaT5 өте маңызды үш деректер жиынтығын пайдаланады / Despite being an independent project, DalaT5 makes use of three very important datasets:
+- The first ~1.8 million records of the Kazakh subset of the CC100 dataset by [Conneau et al. (2020)](https://paperswithcode.com/paper/unsupervised-cross-lingual-representation-1)
 - The raw, Kazakh-focused part of the [Kazakh Parallel Corpus (KazParC)](https://huggingface.co/datasets/issai/kazparc) from Nazarbayev University's Institute of Smart Systems and Artificial Intelligence (ISSAI), graciously made available on Hugging Face
 - The Wikipedia dump of articles in the Kazakh language, obtained via the `wikiextractor` Python package