Create SFT/prepare_sft_dataset.py
Browse files
SFT/prepare_sft_dataset.py
ADDED
|
@@ -0,0 +1,9 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
'''
|
| 2 |
+
File Name: prepare_sft_dataset.py Author: Nikhil Malhotra
|
| 3 |
+
Date: 21/7/2024
|
| 4 |
+
purpose: The purpose of a file is to create high quality SFT dataset for Project Indus.
|
| 5 |
+
Dataset source is obtained from Hugging face and enables to get high quality SFT dataset
|
| 6 |
+
Dataset is then translated in requisite dialects as supported by Google
|
| 7 |
+
Dataset is also split in train and test and enables to create requisite files
|
| 8 |
+
Name of the file carries the source, translated into a dialect along with split type
|
| 9 |
+
'''
|