File size: 4,264 Bytes
995b7a3 fb589f4 995b7a3 fb589f4 995b7a3 fb589f4 995b7a3 fb589f4 995b7a3 fb589f4 995b7a3 fb589f4 995b7a3 fb589f4 995b7a3 fb589f4 995b7a3 fb589f4 995b7a3 fb589f4 995b7a3 fb589f4 995b7a3 fb589f4 995b7a3 fb589f4 995b7a3 fb589f4 995b7a3 fb589f4 995b7a3 fb589f4 995b7a3 fb589f4 995b7a3 fb589f4 995b7a3 fb589f4 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 |
---
license: apache-2.0
datasets:
- synapti/nci-propaganda-production
base_model: answerdotai/ModernBERT-base
tags:
- transformers
- modernbert
- text-classification
- propaganda-detection
- binary-classification
- nci-protocol
library_name: transformers
pipeline_tag: text-classification
---
# NCI Binary Detector
Fast binary classifier that detects whether text contains propaganda techniques.
## Model Description
This model is **Stage 1** of the NCI (Narrative Credibility Index) two-stage propaganda detection pipeline:
- **Stage 1 (this model)**: Fast binary detection - "Does this text contain propaganda?"
- **Stage 2**: Multi-label technique classification - "Which specific techniques are used?"
The binary detector serves as a fast filter with high recall, passing flagged content to the more detailed technique classifier.
## Labels
| Label | Description |
|-------|-------------|
| `no_propaganda` | Text does not contain propaganda techniques |
| `has_propaganda` | Text contains one or more propaganda techniques |
## Performance
**Test Set Results:**
| Metric | Score |
|--------|-------|
| Accuracy | 99.5% |
| F1 Score | 99.6% |
| Precision | 99.2% |
| Recall | 100.0% |
| ROC AUC | 99.9% |
## Usage
### Basic Usage
```python
from transformers import pipeline
detector = pipeline(
"text-classification",
model="synapti/nci-binary-detector"
)
text = "The radical left is DESTROYING our country!"
result = detector(text)[0]
print(f"Label: {result['label']}") # 'has_propaganda' or 'no_propaganda'
print(f"Confidence: {result['score']:.2%}")
```
### Two-Stage Pipeline
For best results, use with the technique classifier:
```python
from transformers import pipeline
# Stage 1: Binary detection
detector = pipeline("text-classification", model="synapti/nci-binary-detector")
# Stage 2: Technique classification (only if propaganda detected)
classifier = pipeline("text-classification", model="synapti/nci-technique-classifier", top_k=None)
text = "Your text to analyze..."
# Quick check first
detection = detector(text)[0]
if detection["label"] == "has_propaganda" and detection["score"] > 0.5:
# Detailed technique analysis
techniques = classifier(text)[0]
detected = [t for t in techniques if t["score"] > 0.3]
for t in detected:
print(f"{t['label']}: {t['score']:.2%}")
else:
print("No propaganda detected")
```
## Training Data
Trained on [synapti/nci-propaganda-production](https://huggingface.co/datasets/synapti/nci-propaganda-production):
- **23,000+ examples** from multiple sources
- **Positive examples**: Text with 1+ propaganda techniques (from SemEval-2020, augmented data)
- **Hard negatives**: Factual content from LIAR2, QBias datasets
- **Class-weighted Focal Loss** to handle imbalance (gamma=2.0)
## Model Architecture
- **Base Model**: [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base)
- **Parameters**: 149.6M
- **Max Sequence Length**: 512 tokens
- **Output**: 2 labels (binary classification)
## Training Details
- **Loss Function**: Focal Loss (gamma=2.0, alpha=0.25)
- **Optimizer**: AdamW
- **Learning Rate**: 2e-5
- **Batch Size**: 16 (effective 32 with gradient accumulation)
- **Epochs**: 5 with early stopping (patience=3)
- **Hardware**: NVIDIA A10G GPU
## Limitations
- Trained primarily on English text
- Works best on content similar to training distribution (news articles, social media posts)
- May not detect subtle or novel propaganda techniques not in training data
- Should be used alongside human review for high-stakes applications
## Related Models
- [synapti/nci-technique-classifier](https://huggingface.co/synapti/nci-technique-classifier) - Stage 2 multi-label technique classifier
## Citation
```bibtex
@inproceedings{da-san-martino-etal-2020-semeval,
title = "{S}em{E}val-2020 Task 11: Detection of Propaganda Techniques in News Articles",
author = "Da San Martino, Giovanni and others",
booktitle = "Proceedings of SemEval-2020",
year = "2020",
}
@misc{nci-binary-detector,
author = {NCI Protocol Team},
title = {NCI Binary Detector},
year = {2024},
publisher = {HuggingFace},
url = {https://huggingface.co/synapti/nci-binary-detector}
}
```
## License
Apache 2.0
|