File size: 4,264 Bytes
995b7a3
 
fb589f4
 
995b7a3
 
fb589f4
 
 
 
 
 
 
 
995b7a3
 
fb589f4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
995b7a3
fb589f4
995b7a3
fb589f4
995b7a3
fb589f4
 
 
 
995b7a3
fb589f4
995b7a3
fb589f4
 
 
 
995b7a3
fb589f4
995b7a3
fb589f4
 
 
 
 
 
995b7a3
fb589f4
995b7a3
fb589f4
 
 
 
995b7a3
fb589f4
995b7a3
fb589f4
995b7a3
fb589f4
995b7a3
fb589f4
 
 
 
 
 
 
995b7a3
fb589f4
 
 
 
 
 
 
 
995b7a3
fb589f4
995b7a3
fb589f4
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
---
license: apache-2.0
datasets:
- synapti/nci-propaganda-production
base_model: answerdotai/ModernBERT-base
tags:
- transformers
- modernbert
- text-classification
- propaganda-detection
- binary-classification
- nci-protocol
library_name: transformers
pipeline_tag: text-classification
---

# NCI Binary Detector

Fast binary classifier that detects whether text contains propaganda techniques.

## Model Description

This model is **Stage 1** of the NCI (Narrative Credibility Index) two-stage propaganda detection pipeline:

- **Stage 1 (this model)**: Fast binary detection - "Does this text contain propaganda?"
- **Stage 2**: Multi-label technique classification - "Which specific techniques are used?"

The binary detector serves as a fast filter with high recall, passing flagged content to the more detailed technique classifier.

## Labels

| Label | Description |
|-------|-------------|
| `no_propaganda` | Text does not contain propaganda techniques |
| `has_propaganda` | Text contains one or more propaganda techniques |

## Performance

**Test Set Results:**

| Metric | Score |
|--------|-------|
| Accuracy | 99.5% |
| F1 Score | 99.6% |
| Precision | 99.2% |
| Recall | 100.0% |
| ROC AUC | 99.9% |

## Usage

### Basic Usage

```python
from transformers import pipeline

detector = pipeline(
    "text-classification",
    model="synapti/nci-binary-detector"
)

text = "The radical left is DESTROYING our country!"
result = detector(text)[0]

print(f"Label: {result['label']}")  # 'has_propaganda' or 'no_propaganda'
print(f"Confidence: {result['score']:.2%}")
```

### Two-Stage Pipeline

For best results, use with the technique classifier:

```python
from transformers import pipeline

# Stage 1: Binary detection
detector = pipeline("text-classification", model="synapti/nci-binary-detector")

# Stage 2: Technique classification (only if propaganda detected)
classifier = pipeline("text-classification", model="synapti/nci-technique-classifier", top_k=None)

text = "Your text to analyze..."

# Quick check first
detection = detector(text)[0]
if detection["label"] == "has_propaganda" and detection["score"] > 0.5:
    # Detailed technique analysis
    techniques = classifier(text)[0]
    detected = [t for t in techniques if t["score"] > 0.3]
    for t in detected:
        print(f"{t['label']}: {t['score']:.2%}")
else:
    print("No propaganda detected")
```

## Training Data

Trained on [synapti/nci-propaganda-production](https://huggingface.co/datasets/synapti/nci-propaganda-production):

- **23,000+ examples** from multiple sources
- **Positive examples**: Text with 1+ propaganda techniques (from SemEval-2020, augmented data)
- **Hard negatives**: Factual content from LIAR2, QBias datasets
- **Class-weighted Focal Loss** to handle imbalance (gamma=2.0)

## Model Architecture

- **Base Model**: [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base)
- **Parameters**: 149.6M
- **Max Sequence Length**: 512 tokens
- **Output**: 2 labels (binary classification)

## Training Details

- **Loss Function**: Focal Loss (gamma=2.0, alpha=0.25)
- **Optimizer**: AdamW
- **Learning Rate**: 2e-5
- **Batch Size**: 16 (effective 32 with gradient accumulation)
- **Epochs**: 5 with early stopping (patience=3)
- **Hardware**: NVIDIA A10G GPU

## Limitations

- Trained primarily on English text
- Works best on content similar to training distribution (news articles, social media posts)
- May not detect subtle or novel propaganda techniques not in training data
- Should be used alongside human review for high-stakes applications

## Related Models

- [synapti/nci-technique-classifier](https://huggingface.co/synapti/nci-technique-classifier) - Stage 2 multi-label technique classifier

## Citation

```bibtex
@inproceedings{da-san-martino-etal-2020-semeval,
    title = "{S}em{E}val-2020 Task 11: Detection of Propaganda Techniques in News Articles",
    author = "Da San Martino, Giovanni and others",
    booktitle = "Proceedings of SemEval-2020",
    year = "2020",
}

@misc{nci-binary-detector,
  author = {NCI Protocol Team},
  title = {NCI Binary Detector},
  year = {2024},
  publisher = {HuggingFace},
  url = {https://huggingface.co/synapti/nci-binary-detector}
}
```

## License

Apache 2.0