Update README.md
Browse files
README.md
CHANGED
|
@@ -17,8 +17,10 @@ library_name: peft
|
|
| 17 |
|
| 18 |
# Model Card for SinLlama
|
| 19 |
|
| 20 |
-
SinLlama is the first large language model specifically extended for Sinhala. It is based on Meta-Llama-3-8B and adapted through tokenizer vocabulary extension and continual pretraining on a 10M sentence Sinhala corpus. SinLlama significantly improves coverage and performance for Sinhala NLP tasks compared to base and instruct versions of Llama-3-8B.
|
| 21 |
|
|
|
|
|
|
|
| 22 |
---
|
| 23 |
|
| 24 |
## Model Details
|
|
@@ -50,13 +52,9 @@ Subsequent fine-tuning on Sinhala classification datasets (news categorization,
|
|
| 50 |
|
| 51 |
## Uses
|
| 52 |
|
| 53 |
-
### Direct Use
|
| 54 |
-
- Sinhala text generation
|
| 55 |
-
- Sinhala text classification
|
| 56 |
-
- Sentiment analysis, news categorization, and writing style classification
|
| 57 |
|
| 58 |
### Downstream Use
|
| 59 |
-
- Instruction tuning for Sinhala dialogue systems
|
| 60 |
- Cross-lingual applications involving Sinhala
|
| 61 |
- Educational and research applications in low-resource NLP
|
| 62 |
|
|
@@ -74,7 +72,7 @@ Subsequent fine-tuning on Sinhala classification datasets (news categorization,
|
|
| 74 |
- **Risk:** Misuse in spreading misinformation or biased outputs in Sinhala.
|
| 75 |
|
| 76 |
### Recommendations
|
| 77 |
-
Users should carefully evaluate outputs before deployment, especially in sensitive or safety-critical applications. Fine-tuning with task/domain-specific Sinhala data is
|
| 78 |
|
| 79 |
---
|
| 80 |
|
|
|
|
| 17 |
|
| 18 |
# Model Card for SinLlama
|
| 19 |
|
| 20 |
+
SinLlama is the first large language model specifically extended for Sinhala. It is based on Meta-Llama-3-8B and adapted through tokenizer vocabulary extension and continual pretraining on a 10M sentence Sinhala corpus. SinLlama significantly improves coverage and performance for Sinhala NLP tasks compared to base and instruct versions of Llama-3-8B.
|
| 21 |
|
| 22 |
+
*DISCLAIMER*
|
| 23 |
+
This is a base model, which has NOT been instruct-tuned. So you still need to do task-specific fine-tuning.
|
| 24 |
---
|
| 25 |
|
| 26 |
## Model Details
|
|
|
|
| 52 |
|
| 53 |
## Uses
|
| 54 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 55 |
|
| 56 |
### Downstream Use
|
| 57 |
+
- Instruction tuning for Sinhala dialogue systems, text classification, etc
|
| 58 |
- Cross-lingual applications involving Sinhala
|
| 59 |
- Educational and research applications in low-resource NLP
|
| 60 |
|
|
|
|
| 72 |
- **Risk:** Misuse in spreading misinformation or biased outputs in Sinhala.
|
| 73 |
|
| 74 |
### Recommendations
|
| 75 |
+
Users should carefully evaluate outputs before deployment, especially in sensitive or safety-critical applications. Fine-tuning with task/domain-specific Sinhala data is required for robustness.
|
| 76 |
|
| 77 |
---
|
| 78 |
|