---
license: apache-2.0
base_model: answerdotai/ModernBERT-base
tags:
- transformers
- modernbert
- text-classification
- propaganda-detection
- binary-classification
- nci-protocol
datasets:
- synapti/nci-propaganda-production
metrics:
- accuracy
- f1
- precision
- recall
pipeline_tag: text-classification
---

# NCI Binary Propaganda Detector

Binary classifier that detects whether text contains propaganda/manipulation techniques.

## Model Description

This model is **Stage 1** of the NCI (Narrative Credibility Index) two-stage propaganda detection pipeline:

- **Stage 1 (this model)**: Fast binary detection - "Does this text contain propaganda?"
- **Stage 2**: Multi-label technique classification - "Which specific techniques are used?"

The binary detector is optimized for **high recall** to ensure manipulative content is not missed, while Stage 2 provides detailed technique classification.

## Intended Uses

- Fast filtering of content for propaganda presence
- First-pass screening in content moderation pipelines
- Real-time detection in social media monitoring
- Input gating for detailed technique analysis

## Training Data

Trained on the [synapti/nci-propaganda-production](https://huggingface.co/datasets/synapti/nci-propaganda-production) dataset:

- **23,000+ examples** from multiple sources
- **Positive examples**: SemEval-2020 Task 11 propaganda techniques
- **Hard negatives**: LIAR2 factual statements, Qbias center-biased news
- **Train/Val/Test split**: 80/10/10

## Performance

| Metric | Score |
|--------|-------|
| Accuracy | ~95% |
| F1 | ~94% |
| Precision | ~96% |
| Recall | ~92% |

## Usage

```python
from transformers import pipeline

# Load the model
detector = pipeline("text-classification", model="synapti/nci-binary-detector")

# Detect propaganda
text = "The radical left wants to DESTROY our country!"
result = detector(text)

# Result: {'label': 'LABEL_1', 'score': 0.99}
# LABEL_0 = no propaganda, LABEL_1 = has propaganda
```

### Two-Stage Pipeline

For complete propaganda analysis, use with the technique classifier:

```python
from transformers import pipeline

binary = pipeline("text-classification", model="synapti/nci-binary-detector")
technique = pipeline("text-classification", model="synapti/nci-technique-classifier", top_k=None)

text = "Your text here..."

# Stage 1: Binary detection
binary_result = binary(text)[0]
has_propaganda = binary_result["label"] == "LABEL_1"

if has_propaganda:
    # Stage 2: Technique classification
    techniques = technique(text)[0]
    detected = [t for t in techniques if t["score"] > 0.3]
```

## Model Architecture

- **Base Model**: [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base)
- **Parameters**: 149.6M
- **Max Sequence Length**: 512 tokens
- **Output**: 2 classes (no_propaganda, has_propaganda)

## Training Details

- **Loss Function**: Focal Loss (gamma=2.0, alpha=0.25)
- **Optimizer**: AdamW
- **Learning Rate**: 2e-5
- **Batch Size**: 16 (effective 64 with gradient accumulation)
- **Epochs**: 5 with early stopping
- **Hardware**: NVIDIA A10G GPU

## Limitations

- Trained primarily on English text
- May not detect novel propaganda techniques not in training data
- Optimized for short-to-medium length text (tweets, headlines, paragraphs)
- Should be used as part of a larger analysis pipeline, not as sole arbiter

## Citation

```bibtex
@misc{nci-binary-detector,
  author = {NCI Protocol Team},
  title = {NCI Binary Propaganda Detector},
  year = {2024},
  publisher = {HuggingFace},
  url = {https://huggingface.co/synapti/nci-binary-detector}
}
```

## License

Apache 2.0