fine-tuned IndicNER

fine-tuned IndicNER is a model trained to complete the task of identifying named entities from sentences in Indian languages. Our model is specifically fine-tuned to the 11 Indian languages mentioned above over millions of sentences. The model is then benchmarked over a human annotated testset and multiple other publicly available Indian NER datasets. The 11 languages covered by IndicNER are: Assamese, Bengali, Gujarati, Hindi, Kannada, Malayalam, Marathi, Oriya, Punjabi, Tamil, Telugu.

Training Corpus

Our model was trained on a dataset which we mined from the existing Samanantar Corpus. We used a bert-base-multilingual-uncased model as the starting point and then fine-tuned it to the NER dataset mentioned previously.

Downloads

Download from this same Huggingface repo.

Update 20 Dec 2022: We released a new paper documenting IndicNER and Naamapadam. We have a different model reported in the paper. We will update the repo here soon with this model.

Usage

You can use this Colab notebook for samples on using IndicNER or for finetuning a pre-trained model on Naampadam dataset to build your own NER models.

License

The fine-tuned-IndicNER code (and models) are released under the MIT License.

Downloads last month
42
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Space using techysanoj/fine-tuned-IndicNER 1