Indonesian Regional Languages Identifier

Fine-tuned XLM-RoBERTa model for identifying 11 Indonesian regional languages + English.

Supported Languages

  • ๐Ÿ‡ฎ๐Ÿ‡ฉ Indonesian (Bahasa Indonesia)
  • Acehnese (Bahasa Aceh)
  • Balinese (Basa Bali)
  • Banjarese (Bahasa Banjar)
  • Buginese (Basa Ugi)
  • Javanese (Basa Jawa)
  • Madurese (Basa Madhura)
  • Minangkabau (Baso Minang)
  • Ngaju (Basa Ngaju)
  • Sundanese (Basa Sunda)
  • Toba Batak (Hata Batak Toba)
  • ๐Ÿ‡ฌ๐Ÿ‡ง English

Model Performance

  • Accuracy: 0.9783
  • F1 Macro: 0.9783
  • F1 Weighted: 0.9783
  • Precision: 0.9785
  • Recall: 0.9783

Usage

from transformers import pipeline

# Load model
classifier = pipeline("text-classification", model="YOUR_USERNAME/xlm-roberta-indonesian-languages")

# Single prediction
result = classifier("Sugeng enjing, piye kabare?")
print(result)
# Output: [{'label': 'javanese', 'score': 0.9876}]

# Batch prediction
texts = [
    "Selamat pagi, apa kabar?",
    "Wilujeng enjing, kumaha damang?",
    "Good morning, how are you?"
]

results = classifier(texts)
for text, result in zip(texts, results):
    print(f"{text} -> {result['label']} ({result['score']:.4f})")

Training Details

  • Base Model: xlm-roberta-base
  • Training Samples: 6000
  • Validation Samples: 1200
  • Epochs: 5
  • Learning Rate: 2e-05
  • Batch Size: 16
  • Training Date: 20251124_070409

Citation

If you use this model, please cite:

@misc{indonesian-language-id,
    author = {Raihan Hidayatullah Djunaedi},
    title = {Indonesian Regional Languages Identifier},
    year = {2025},
    publisher = {Hugging Face},
    url = {https://huggingface.co/nahiar/xlm-roberta-indonesian-languages}
}
Downloads last month
112
Safetensors
Model size
0.3B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for nahiar/xlm-roberta-indonesian-languages

Finetuned
(3598)
this model

Dataset used to train nahiar/xlm-roberta-indonesian-languages

Collection including nahiar/xlm-roberta-indonesian-languages