language:-telicense:apache-2.0base_model:emilyalsentzer/Bio_ClinicalBERTtags:-token-classification-ner-pii-pii-detection-de-identification-privacy-healthcare-medical-clinical-phi-telugu-pytorch-transformers-openmedpipeline_tag:token-classificationlibrary_name:transformersmetrics:-f1-precision-recallmodel-index:-name:OpenMed-PII-Telugu-BioClinicalBERT-110M-v1results:-task:type:token-classificationname:NamedEntityRecognitiondataset:name:AI4Privacy(Telugusubset)type:ai4privacy/pii-masking-400ksplit:testmetrics:-type:f1value:0.8683name:F1(micro)-type:precisionvalue:0.8887name:Precision-type:recallvalue:0.8489name:Recallwidget:-text:>- డా. రాజేష్ శర్మ (ఆధార్: 1234 5678 9012) ను [email protected] లేదా +91 98765 43210 లో సంప్రదించవచ్చు. చిరునామా: 42 గాంధీ రోడ్, 500001 హైదరాబాద్.example_title:ClinicalNotewithPII(Telugu)
OpenMed-PII-Telugu-BioClinicalBERT-110M-v1
Telugu PII Detection Model | 110M Parameters | Open Source
Model Description
OpenMed-PII-Telugu-BioClinicalBERT-110M-v1 is a transformer-based token classification model fine-tuned for Personally Identifiable Information (PII) detection in Telugu text. This model identifies and classifies 54 types of sensitive information including names, addresses, social security numbers, medical record numbers, and more.
Key Features
Telugu-Optimized: Specifically trained on Telugu text for optimal performance
High Accuracy: Achieves strong F1 scores across diverse PII categories
Comprehensive Coverage: Detects 55+ entity types spanning personal, financial, medical, and contact information
Privacy-Focused: Designed for de-identification and compliance with GDPR and other privacy regulations
Production-Ready: Optimized for real-world text processing pipelines
Performance
Evaluated on the Telugu subset of AI4Privacy dataset: