nci-binary-detector / README.md

synapti

Upload README.md with huggingface_hub

8ea3b2f verified about 1 month ago

preview code

raw

history blame

3.64 kB

metadata

license: apache-2.0
base_model: answerdotai/ModernBERT-base
tags:
  - transformers
  - modernbert
  - text-classification
  - propaganda-detection
  - binary-classification
  - nci-protocol
datasets:
  - synapti/nci-propaganda-production
metrics:
  - accuracy
  - f1
  - precision
  - recall
pipeline_tag: text-classification

NCI Binary Propaganda Detector

Binary classifier that detects whether text contains propaganda/manipulation techniques.

Model Description

This model is Stage 1 of the NCI (Narrative Credibility Index) two-stage propaganda detection pipeline:

Stage 1 (this model): Fast binary detection - "Does this text contain propaganda?"
Stage 2: Multi-label technique classification - "Which specific techniques are used?"

The binary detector is optimized for high recall to ensure manipulative content is not missed, while Stage 2 provides detailed technique classification.

Intended Uses

Fast filtering of content for propaganda presence
First-pass screening in content moderation pipelines
Real-time detection in social media monitoring
Input gating for detailed technique analysis

Training Data

Trained on the synapti/nci-propaganda-production dataset:

23,000+ examples from multiple sources
Positive examples: SemEval-2020 Task 11 propaganda techniques
Hard negatives: LIAR2 factual statements, Qbias center-biased news
Train/Val/Test split: 80/10/10

Performance

Metric	Score
Accuracy	~95%
F1	~94%
Precision	~96%
Recall	~92%

Usage

from transformers import pipeline

# Load the model
detector = pipeline("text-classification", model="synapti/nci-binary-detector")

# Detect propaganda
text = "The radical left wants to DESTROY our country!"
result = detector(text)

# Result: {'label': 'LABEL_1', 'score': 0.99}
# LABEL_0 = no propaganda, LABEL_1 = has propaganda

Two-Stage Pipeline

For complete propaganda analysis, use with the technique classifier:

from transformers import pipeline

binary = pipeline("text-classification", model="synapti/nci-binary-detector")
technique = pipeline("text-classification", model="synapti/nci-technique-classifier", top_k=None)

text = "Your text here..."

# Stage 1: Binary detection
binary_result = binary(text)[0]
has_propaganda = binary_result["label"] == "LABEL_1"

if has_propaganda:
    # Stage 2: Technique classification
    techniques = technique(text)[0]
    detected = [t for t in techniques if t["score"] > 0.3]

Model Architecture

Base Model: answerdotai/ModernBERT-base
Parameters: 149.6M
Max Sequence Length: 512 tokens
Output: 2 classes (no_propaganda, has_propaganda)

Training Details

Loss Function: Focal Loss (gamma=2.0, alpha=0.25)
Optimizer: AdamW
Learning Rate: 2e-5
Batch Size: 16 (effective 64 with gradient accumulation)
Epochs: 5 with early stopping
Hardware: NVIDIA A10G GPU

Limitations

Trained primarily on English text
May not detect novel propaganda techniques not in training data
Optimized for short-to-medium length text (tweets, headlines, paragraphs)
Should be used as part of a larger analysis pipeline, not as sole arbiter

Citation

@misc{nci-binary-detector,
  author = {NCI Protocol Team},
  title = {NCI Binary Propaganda Detector},
  year = {2024},
  publisher = {HuggingFace},
  url = {https://huggingface.co/synapti/nci-binary-detector}
}

License

Apache 2.0