--- license: apache-2.0 datasets: - synapti/nci-propaganda-production base_model: answerdotai/ModernBERT-base tags: - transformers - modernbert - text-classification - propaganda-detection - binary-classification - nci-protocol library_name: transformers pipeline_tag: text-classification --- # NCI Binary Detector Fast binary classifier that detects whether text contains propaganda techniques. ## Model Description This model is **Stage 1** of the NCI (Narrative Credibility Index) two-stage propaganda detection pipeline: - **Stage 1 (this model)**: Fast binary detection - "Does this text contain propaganda?" - **Stage 2**: Multi-label technique classification - "Which specific techniques are used?" The binary detector serves as a fast filter with high recall, passing flagged content to the more detailed technique classifier. ## Labels | Label | Description | |-------|-------------| | `no_propaganda` | Text does not contain propaganda techniques | | `has_propaganda` | Text contains one or more propaganda techniques | ## Performance **Test Set Results:** | Metric | Score | |--------|-------| | Accuracy | 99.5% | | F1 Score | 99.6% | | Precision | 99.2% | | Recall | 100.0% | | ROC AUC | 99.9% | ## Usage ### Basic Usage ```python from transformers import pipeline detector = pipeline( "text-classification", model="synapti/nci-binary-detector" ) text = "The radical left is DESTROYING our country!" result = detector(text)[0] print(f"Label: {result['label']}") # 'has_propaganda' or 'no_propaganda' print(f"Confidence: {result['score']:.2%}") ``` ### Two-Stage Pipeline For best results, use with the technique classifier: ```python from transformers import pipeline # Stage 1: Binary detection detector = pipeline("text-classification", model="synapti/nci-binary-detector") # Stage 2: Technique classification (only if propaganda detected) classifier = pipeline("text-classification", model="synapti/nci-technique-classifier", top_k=None) text = "Your text to analyze..." # Quick check first detection = detector(text)[0] if detection["label"] == "has_propaganda" and detection["score"] > 0.5: # Detailed technique analysis techniques = classifier(text)[0] detected = [t for t in techniques if t["score"] > 0.3] for t in detected: print(f"{t['label']}: {t['score']:.2%}") else: print("No propaganda detected") ``` ## Training Data Trained on [synapti/nci-propaganda-production](https://huggingface.co/datasets/synapti/nci-propaganda-production): - **23,000+ examples** from multiple sources - **Positive examples**: Text with 1+ propaganda techniques (from SemEval-2020, augmented data) - **Hard negatives**: Factual content from LIAR2, QBias datasets - **Class-weighted Focal Loss** to handle imbalance (gamma=2.0) ## Model Architecture - **Base Model**: [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base) - **Parameters**: 149.6M - **Max Sequence Length**: 512 tokens - **Output**: 2 labels (binary classification) ## Training Details - **Loss Function**: Focal Loss (gamma=2.0, alpha=0.25) - **Optimizer**: AdamW - **Learning Rate**: 2e-5 - **Batch Size**: 16 (effective 32 with gradient accumulation) - **Epochs**: 5 with early stopping (patience=3) - **Hardware**: NVIDIA A10G GPU ## Limitations - Trained primarily on English text - Works best on content similar to training distribution (news articles, social media posts) - May not detect subtle or novel propaganda techniques not in training data - Should be used alongside human review for high-stakes applications ## Related Models - [synapti/nci-technique-classifier](https://huggingface.co/synapti/nci-technique-classifier) - Stage 2 multi-label technique classifier ## Citation ```bibtex @inproceedings{da-san-martino-etal-2020-semeval, title = "{S}em{E}val-2020 Task 11: Detection of Propaganda Techniques in News Articles", author = "Da San Martino, Giovanni and others", booktitle = "Proceedings of SemEval-2020", year = "2020", } @misc{nci-binary-detector, author = {NCI Protocol Team}, title = {NCI Binary Detector}, year = {2024}, publisher = {HuggingFace}, url = {https://huggingface.co/synapti/nci-binary-detector} } ``` ## License Apache 2.0