nci-binary-detector / README.md
synapti's picture
Upload README.md with huggingface_hub
8ea3b2f verified
|
raw
history blame
3.64 kB
metadata
license: apache-2.0
base_model: answerdotai/ModernBERT-base
tags:
  - transformers
  - modernbert
  - text-classification
  - propaganda-detection
  - binary-classification
  - nci-protocol
datasets:
  - synapti/nci-propaganda-production
metrics:
  - accuracy
  - f1
  - precision
  - recall
pipeline_tag: text-classification

NCI Binary Propaganda Detector

Binary classifier that detects whether text contains propaganda/manipulation techniques.

Model Description

This model is Stage 1 of the NCI (Narrative Credibility Index) two-stage propaganda detection pipeline:

  • Stage 1 (this model): Fast binary detection - "Does this text contain propaganda?"
  • Stage 2: Multi-label technique classification - "Which specific techniques are used?"

The binary detector is optimized for high recall to ensure manipulative content is not missed, while Stage 2 provides detailed technique classification.

Intended Uses

  • Fast filtering of content for propaganda presence
  • First-pass screening in content moderation pipelines
  • Real-time detection in social media monitoring
  • Input gating for detailed technique analysis

Training Data

Trained on the synapti/nci-propaganda-production dataset:

  • 23,000+ examples from multiple sources
  • Positive examples: SemEval-2020 Task 11 propaganda techniques
  • Hard negatives: LIAR2 factual statements, Qbias center-biased news
  • Train/Val/Test split: 80/10/10

Performance

Metric Score
Accuracy ~95%
F1 ~94%
Precision ~96%
Recall ~92%

Usage

from transformers import pipeline

# Load the model
detector = pipeline("text-classification", model="synapti/nci-binary-detector")

# Detect propaganda
text = "The radical left wants to DESTROY our country!"
result = detector(text)

# Result: {'label': 'LABEL_1', 'score': 0.99}
# LABEL_0 = no propaganda, LABEL_1 = has propaganda

Two-Stage Pipeline

For complete propaganda analysis, use with the technique classifier:

from transformers import pipeline

binary = pipeline("text-classification", model="synapti/nci-binary-detector")
technique = pipeline("text-classification", model="synapti/nci-technique-classifier", top_k=None)

text = "Your text here..."

# Stage 1: Binary detection
binary_result = binary(text)[0]
has_propaganda = binary_result["label"] == "LABEL_1"

if has_propaganda:
    # Stage 2: Technique classification
    techniques = technique(text)[0]
    detected = [t for t in techniques if t["score"] > 0.3]

Model Architecture

  • Base Model: answerdotai/ModernBERT-base
  • Parameters: 149.6M
  • Max Sequence Length: 512 tokens
  • Output: 2 classes (no_propaganda, has_propaganda)

Training Details

  • Loss Function: Focal Loss (gamma=2.0, alpha=0.25)
  • Optimizer: AdamW
  • Learning Rate: 2e-5
  • Batch Size: 16 (effective 64 with gradient accumulation)
  • Epochs: 5 with early stopping
  • Hardware: NVIDIA A10G GPU

Limitations

  • Trained primarily on English text
  • May not detect novel propaganda techniques not in training data
  • Optimized for short-to-medium length text (tweets, headlines, paragraphs)
  • Should be used as part of a larger analysis pipeline, not as sole arbiter

Citation

@misc{nci-binary-detector,
  author = {NCI Protocol Team},
  title = {NCI Binary Propaganda Detector},
  year = {2024},
  publisher = {HuggingFace},
  url = {https://huggingface.co/synapti/nci-binary-detector}
}

License

Apache 2.0