- This is the trained model of the seq2seq approach of our Sinhala Transliteration solution that was submitted to the Shared Task of IndoNLPWorkshop 2025 collocated with COLING2025.
- The official codebase can be accessed from https://github.com/kasunw22/Sinhala-Transliterator
- Please be kind enough to cite our paper and don't hesitate to rate our original repository
@inproceedings{de-mel-etal-2025-sinhala,
title = "{S}inhala Transliteration: A Comparative Analysis Between Rule-based and {S}eq2{S}eq Approaches",
author = "De Mel, Yomal and
Wickramasinghe, Kasun and
de Silva, Nisansa and
Ranathunga, Surangika",
editor = "Weerasinghe, Ruvan and
Anuradha, Isuri and
Sumanathilaka, Deshan",
booktitle = "Proceedings of the First Workshop on Natural Language Processing for Indo-Aryan and Dravidian Languages",
month = jan,
year = "2025",
address = "Abu Dhabi",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2025.indonlp-1.19/",
pages = "166--173",
abstract = "Due to reasons of convenience and lack of tech literacy, transliteration (i.e., Romanizing native scripts instead of using localization tools) is eminently prevalent in the context of low-resource languages such as Sinhala, which have their own writing script. In this study, our focus is on Romanized Sinhala transliteration. We propose two methods to address this problem: Our baseline is a rule-based method, which is then compared against our second method where we approach the transliteration problem as a sequence-to-sequence task akin to the established Neural Machine Translation (NMT) task. For the latter, we propose a Transformer based Encode-Decoder solution. We witnessed that the Transformer-based method could grab many ad-hoc patterns within the Romanized scripts compared to the rule-based method."
}
- Downloads last month
- 15
Model tree for kasunw/sinhala-transliterator
Base model
facebook/m2m100_418M