--- license: apache-2.0 base_model: answerdotai/ModernBERT-base tags: - transformers - modernbert - text-classification - propaganda-detection - binary-classification - nci-protocol datasets: - synapti/nci-propaganda-production metrics: - accuracy - f1 - precision - recall pipeline_tag: text-classification --- # NCI Binary Propaganda Detector Binary classifier that detects whether text contains propaganda/manipulation techniques. ## Model Description This model is **Stage 1** of the NCI (Narrative Credibility Index) two-stage propaganda detection pipeline: - **Stage 1 (this model)**: Fast binary detection - "Does this text contain propaganda?" - **Stage 2**: Multi-label technique classification - "Which specific techniques are used?" The binary detector is optimized for **high recall** to ensure manipulative content is not missed, while Stage 2 provides detailed technique classification. ## Intended Uses - Fast filtering of content for propaganda presence - First-pass screening in content moderation pipelines - Real-time detection in social media monitoring - Input gating for detailed technique analysis ## Training Data Trained on the [synapti/nci-propaganda-production](https://huggingface.co/datasets/synapti/nci-propaganda-production) dataset: - **23,000+ examples** from multiple sources - **Positive examples**: SemEval-2020 Task 11 propaganda techniques - **Hard negatives**: LIAR2 factual statements, Qbias center-biased news - **Train/Val/Test split**: 80/10/10 ## Performance | Metric | Score | |--------|-------| | Accuracy | ~95% | | F1 | ~94% | | Precision | ~96% | | Recall | ~92% | ## Usage ```python from transformers import pipeline # Load the model detector = pipeline("text-classification", model="synapti/nci-binary-detector") # Detect propaganda text = "The radical left wants to DESTROY our country!" result = detector(text) # Result: {'label': 'LABEL_1', 'score': 0.99} # LABEL_0 = no propaganda, LABEL_1 = has propaganda ``` ### Two-Stage Pipeline For complete propaganda analysis, use with the technique classifier: ```python from transformers import pipeline binary = pipeline("text-classification", model="synapti/nci-binary-detector") technique = pipeline("text-classification", model="synapti/nci-technique-classifier", top_k=None) text = "Your text here..." # Stage 1: Binary detection binary_result = binary(text)[0] has_propaganda = binary_result["label"] == "LABEL_1" if has_propaganda: # Stage 2: Technique classification techniques = technique(text)[0] detected = [t for t in techniques if t["score"] > 0.3] ``` ## Model Architecture - **Base Model**: [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base) - **Parameters**: 149.6M - **Max Sequence Length**: 512 tokens - **Output**: 2 classes (no_propaganda, has_propaganda) ## Training Details - **Loss Function**: Focal Loss (gamma=2.0, alpha=0.25) - **Optimizer**: AdamW - **Learning Rate**: 2e-5 - **Batch Size**: 16 (effective 64 with gradient accumulation) - **Epochs**: 5 with early stopping - **Hardware**: NVIDIA A10G GPU ## Limitations - Trained primarily on English text - May not detect novel propaganda techniques not in training data - Optimized for short-to-medium length text (tweets, headlines, paragraphs) - Should be used as part of a larger analysis pipeline, not as sole arbiter ## Citation ```bibtex @misc{nci-binary-detector, author = {NCI Protocol Team}, title = {NCI Binary Propaganda Detector}, year = {2024}, publisher = {HuggingFace}, url = {https://huggingface.co/synapti/nci-binary-detector} } ``` ## License Apache 2.0