This is the trained model of the seq2seq approach of our Sinhala Transliteration solution that was submitted to the Shared Task of IndoNLPWorkshop 2025 collocated with COLING2025.
The official codebase can be accessed from https://github.com/kasunw22/Sinhala-Transliterator
Please be kind enough to cite our paper and don't hesitate to rate our original repository

@inproceedings{de-mel-etal-2025-sinhala,
    title = "{S}inhala Transliteration: A Comparative Analysis Between Rule-based and {S}eq2{S}eq Approaches",
    author = "De Mel, Yomal  and
      Wickramasinghe, Kasun  and
      de Silva, Nisansa  and
      Ranathunga, Surangika",
    editor = "Weerasinghe, Ruvan  and
      Anuradha, Isuri  and
      Sumanathilaka, Deshan",
    booktitle = "Proceedings of the First Workshop on Natural Language Processing for Indo-Aryan and Dravidian Languages",
    month = jan,
    year = "2025",
    address = "Abu Dhabi",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.indonlp-1.19/",
    pages = "166--173",
    abstract = "Due to reasons of convenience and lack of tech literacy, transliteration (i.e., Romanizing native scripts instead of using localization tools) is eminently prevalent in the context of low-resource languages such as Sinhala, which have their own writing script. In this study, our focus is on Romanized Sinhala transliteration. We propose two methods to address this problem: Our baseline is a rule-based method, which is then compared against our second method where we approach the transliteration problem as a sequence-to-sequence task akin to the established Neural Machine Translation (NMT) task. For the latter, we propose a Transformer based Encode-Decoder solution. We witnessed that the Transformer-based method could grab many ad-hoc patterns within the Romanized scripts compared to the rule-based method."
}

Downloads last month: 15

Safetensors

Model size

0.5B params

Tensor type

F32

Model tree for kasunw/sinhala-transliterator

Base model

facebook/m2m100_418M

Finetuned

(125)

this model