--- library_name: transformers license: apache-2.0 tags: - text-classification - sequence-classification - safety - child-protection - harmful-content-detection model-index: - name: My Awesome Model results: - task: type: text-classification name: Text Classification dataset: type: custom name: Child Protection Dataset metrics: - type: accuracy value: 0.87 - type: precision value: 0.85 - type: recall value: 0.84 - type: f1 value: 0.84 - type: auc value: 0.90 --- # My Awesome Model 🚀 _A fine-tuned BERT model for harmful vs safe child-protection text classification._ ## Model Description This model is a fine-tuned [🤗 Transformers](https://huggingface.co/transformers/) model for **text classification**. It was trained to classify text into predefined categories, making it useful for applications such as content moderation, sentiment analysis, or other natural language understanding tasks. - **Developed by:** Erin Clarke - **Shared by:** Erin Clarke - **Model type:** Text Classification (Sequence Classification) - **Language(s):** English - **License:** Apache 2.0 - **Finetuned from model:** [e.g., `bert-base-uncased`, `distilroberta-base`] --- ## Intended Uses This model is intended for research and practical use in classifying text into categories. Example use cases include: - Detecting harmful or safe content - Sentiment classification - Intent recognition **Not for:** production deployment in safety-critical environments without thorough evaluation. --- ## Training Data - Dataset: Custom dataset of online chat and text messages labeled for **child protection**, with categories: `harmful` vs `safe`. - Size: 12,000 examples total - Split: 9,600 train / 1,200 validation / 1,200 test - Source: Collected from publicly available, de-identified text datasets and synthetic examples designed for safety research. - Preprocessing: Text was cleaned by removing personally identifiable information (PII), lowercasing, and normalizing special characters. --- ## Training Procedure - Framework: 🤗 Transformers (Trainer API) - Optimizer: AdamW - Batch size: 16 - Learning rate: 5e-5 - Epochs: 3 - Hardware: 1× NVIDIA Tesla T4 GPU (Google Colab environment) - Loss Function: CrossEntropyLoss (binary classification) - Early Stopping: Not used (fixed epochs) - Gradient Clipping: Applied at 1.0 to prevent exploding gradients - Mixed Precision: FP16 training enabled for efficiency --- ## Evaluation Results The model was evaluated on the held-out **test set** of 1,200 examples. Metrics are reported for the binary classification task (`harmful` vs `safe`). | Metric | Score | |--------------|-------| | Accuracy | 0.87 | | Precision | 0.85 | | Recall | 0.84 | | F1 Score | 0.84 | | ROC AUC | 0.90 | - **Precision** indicates the proportion of predicted harmful messages that were truly harmful. - **Recall** indicates the proportion of actual harmful messages that the model correctly identified. - **F1 Score** balances precision and recall for overall effectiveness. - **ROC AUC** shows strong discriminatory ability between harmful and safe text. --- ## Limitations & Bias - **Bias in Data**: The dataset may reflect biases from the sources used. Certain slang, cultural contexts, or dialects may be underrepresented, which can affect model performance for specific groups of users. - **False Negatives**: The model may occasionally classify harmful text as safe, especially if the harmful content is subtle, coded, or context-dependent. - **False Positives**: Harmless text may sometimes be flagged as harmful, particularly if it contains strong language used in a non-threatening context. - **Generalization**: The model has only been tested on the training and test datasets provided. Performance on live, real-world child protection data may differ. - **Sensitive Use Case**: This model is not a replacement for human review. It should only be used as an assistive tool and not as the sole decision-maker in child safety contexts. - **Ethical Note**: Any deployment should include monitoring, continuous evaluation, and safeguards to prevent misuse. --- --- ## How to Use ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("MangoScooter/my-awesome-model") model = AutoModelForSequenceClassification.from_pretrained("MangoScooter/my-awesome-model") inputs = tokenizer("Your text here", return_tensors="pt") outputs = model(**inputs) ``` --- ## Citation If you use this model, please cite: ``` @misc{my-awesome-model, author = {Erin Clarke}, title = {My Awesome Model}, year = {2025}, publisher = {Hugging Face}, howpublished = {\url{https://huggingface.co/MangoScooter/my-awesome-model}} } ```