🎯 RoBERTa Clickbait Classifier

A clickbait detection model built on RoBERTa-base (125M parameters), fine-tuned on multiple combined and deduplicated English datasets.

πŸš€ Quick Start

from transformers import pipeline

classifier = pipeline("text-classification", model="ENTUM-AI/roberta-clickbait-classifier")

# Clickbait
result = classifier("You Won't BELIEVE What This Celebrity Did Next!")
print(result)  # [{'label': 'Clickbait', 'score': 0.99...}]

# Non-Clickbait
result = classifier("Federal Reserve raises interest rates by 0.25 percentage points")
print(result)  # [{'label': 'Non-Clickbait', 'score': 0.99...}]

Model Details

Architecture RoBERTa-base (125M parameters)
Task Binary text classification
Labels Clickbait (1), Non-Clickbait (0)
Language English
License Apache 2.0
Max input length 128 tokens

πŸ“Š Training Data

Three public English clickbait datasets, combined and deduplicated:

Dataset Source
christinacdl/Clickbait_New 58.6K samples from multiple sources
marksverdhei/clickbait_title_classification 32K samples (Chakraborty et al., ASONAM 2016)
contemmcm/clickbait 26K samples

After deduplication and balancing: ~48K samples (train/val/test split 85/10/5).

βš™οΈ Training

Fine-tuned with HuggingFace Trainer using linear LR schedule with warmup, AdamW optimizer, and early stopping on F1 score.

πŸ’‘ Use Cases

  • News aggregators β€” filter low-quality clickbait articles
  • Social media β€” content moderation and feed quality scoring
  • Browser extensions β€” warn users about clickbait headlines
  • Email filters β€” detect clickbait-style subject lines
  • Content platforms β€” automated content quality assessment

⚠️ Limitations

  • English only
  • Optimized for short texts (headlines, titles, tweets); longer texts will be truncated to 128 tokens
  • Reflects patterns and biases present in the training data sources
Downloads last month
-
Safetensors
Model size
0.1B params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Datasets used to train ENTUM-AI/roberta-clickbait-classifier