crossroderick
/

dalat5

Text Generation

text2text-generation

transliteration

Model card Files Files and versions

crossroderick commited on Apr 25

Commit

6b8b15a

·

1 Parent(s): da9c20a

Slight update to the readme

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -111,7 +111,7 @@ print(output)
 Деректер жиынының жалпы өлшемін ескере отырып, олар осы үлгінің репозиторийіне қосылмаған. Дегенмен, DalaT5-ті өзіңіз дәл баптағыңыз келсе, келесі әрекеттерді орындаңыз / Given the total size of the datasets, they haven't been included in this model's repository. However, should you wish to fine-tune DalaT5 yourself, please do the following:
-1. `get_data.sh` қабық сценарий файлын «src/data» қалтасында іске қосыңыз / Run the `get_data.sh` shell script file in the "src/data" folder
 2. Сол қалтадағы `generate_cyr_lat_pairs.py` файлын іске қосыңыз / Run the `generate_cyr_lat_pairs.py` file in the same folder
 3. Қазақ корпус файлын тазалау және деректер жинағын араластыру үшін `generate_clean_corpus.sh` іске қосыңыз / Run `generate_clean_corpus.sh` to clean the Kazakh corpus file and shuffle the dataset
 4. Токенизаторды тазартылған корпусқа үйрету үшін `train_tokeniser.py` іске қосыңыз / Run `train_tokeniser.py` to train the tokeniser on the cleaned corpus

 Деректер жиынының жалпы өлшемін ескере отырып, олар осы үлгінің репозиторийіне қосылмаған. Дегенмен, DalaT5-ті өзіңіз дәл баптағыңыз келсе, келесі әрекеттерді орындаңыз / Given the total size of the datasets, they haven't been included in this model's repository. However, should you wish to fine-tune DalaT5 yourself, please do the following:
+1. `get_data.sh` қабық сценарий файлын "src/data" қалтасында іске қосыңыз / Run the `get_data.sh` shell script file in the "src/data" folder
 2. Сол қалтадағы `generate_cyr_lat_pairs.py` файлын іске қосыңыз / Run the `generate_cyr_lat_pairs.py` file in the same folder
 3. Қазақ корпус файлын тазалау және деректер жинағын араластыру үшін `generate_clean_corpus.sh` іске қосыңыз / Run `generate_clean_corpus.sh` to clean the Kazakh corpus file and shuffle the dataset
 4. Токенизаторды тазартылған корпусқа үйрету үшін `train_tokeniser.py` іске қосыңыз / Run `train_tokeniser.py` to train the tokeniser on the cleaned corpus