NCI Binary Detector

Fast binary classifier that detects whether text contains propaganda techniques.

Model Description

This model is Stage 1 of the NCI (Narrative Credibility Index) two-stage propaganda detection pipeline:

Stage 1 (this model): Fast binary detection - "Does this text contain propaganda?"
Stage 2: Multi-label technique classification - "Which specific techniques are used?"

The binary detector serves as a fast filter with high recall, passing flagged content to the more detailed technique classifier.

Labels

Label	Description
`no_propaganda`	Text does not contain propaganda techniques
`has_propaganda`	Text contains one or more propaganda techniques

Performance

Test Set Results:

Metric	Score
Accuracy	99.5%
F1 Score	99.6%
Precision	99.2%
Recall	100.0%
ROC AUC	99.9%

Usage

Basic Usage

from transformers import pipeline

detector = pipeline(
    "text-classification",
    model="synapti/nci-binary-detector"
)

text = "The radical left is DESTROYING our country!"
result = detector(text)[0]

print(f"Label: {result['label']}")  # 'has_propaganda' or 'no_propaganda'
print(f"Confidence: {result['score']:.2%}")

Two-Stage Pipeline

For best results, use with the technique classifier:

from transformers import pipeline

# Stage 1: Binary detection
detector = pipeline("text-classification", model="synapti/nci-binary-detector")

# Stage 2: Technique classification (only if propaganda detected)
classifier = pipeline("text-classification", model="synapti/nci-technique-classifier", top_k=None)

text = "Your text to analyze..."

# Quick check first
detection = detector(text)[0]
if detection["label"] == "has_propaganda" and detection["score"] > 0.5:
    # Detailed technique analysis
    techniques = classifier(text)[0]
    detected = [t for t in techniques if t["score"] > 0.3]
    for t in detected:
        print(f"{t['label']}: {t['score']:.2%}")
else:
    print("No propaganda detected")

Training Data

Trained on synapti/nci-propaganda-production:

23,000+ examples from multiple sources
Positive examples: Text with 1+ propaganda techniques (from SemEval-2020, augmented data)
Hard negatives: Factual content from LIAR2, QBias datasets
Class-weighted Focal Loss to handle imbalance (gamma=2.0)

Model Architecture

Base Model: answerdotai/ModernBERT-base
Parameters: 149.6M
Max Sequence Length: 512 tokens
Output: 2 labels (binary classification)

Training Details

Loss Function: Focal Loss (gamma=2.0, alpha=0.25)
Optimizer: AdamW
Learning Rate: 2e-5
Batch Size: 16 (effective 32 with gradient accumulation)
Epochs: 5 with early stopping (patience=3)
Hardware: NVIDIA A10G GPU

Limitations

Trained primarily on English text
Works best on content similar to training distribution (news articles, social media posts)
May not detect subtle or novel propaganda techniques not in training data
Should be used alongside human review for high-stakes applications

Related Models

synapti/nci-technique-classifier - Stage 2 multi-label technique classifier

Citation

@inproceedings{da-san-martino-etal-2020-semeval,
    title = "{S}em{E}val-2020 Task 11: Detection of Propaganda Techniques in News Articles",
    author = "Da San Martino, Giovanni and others",
    booktitle = "Proceedings of SemEval-2020",
    year = "2020",
}

@misc{nci-binary-detector,
  author = {NCI Protocol Team},
  title = {NCI Binary Detector},
  year = {2024},
  publisher = {HuggingFace},
  url = {https://huggingface.co/synapti/nci-binary-detector}
}

License

Apache 2.0

Downloads last month: 101

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for synapti/nci-binary-detector

Base model

answerdotai/ModernBERT-base

Finetuned

(985)

this model

synapti
/

nci-binary-detector