NLP Indonesia Multitask
Collection
A collection of Indonesian NLP models for various text classification tasks such as spam detection, hate speech, abusive language, and more. Suitable
โข
8 items
โข
Updated
Fine-tuned XLM-RoBERTa model for identifying 11 Indonesian regional languages + English.
from transformers import pipeline
# Load model
classifier = pipeline("text-classification", model="YOUR_USERNAME/xlm-roberta-indonesian-languages")
# Single prediction
result = classifier("Sugeng enjing, piye kabare?")
print(result)
# Output: [{'label': 'javanese', 'score': 0.9876}]
# Batch prediction
texts = [
"Selamat pagi, apa kabar?",
"Wilujeng enjing, kumaha damang?",
"Good morning, how are you?"
]
results = classifier(texts)
for text, result in zip(texts, results):
print(f"{text} -> {result['label']} ({result['score']:.4f})")
If you use this model, please cite:
@misc{indonesian-language-id,
author = {Raihan Hidayatullah Djunaedi},
title = {Indonesian Regional Languages Identifier},
year = {2025},
publisher = {Hugging Face},
url = {https://huggingface.co/nahiar/xlm-roberta-indonesian-languages}
}
Base model
FacebookAI/xlm-roberta-base