DeBERTa-v3-Small – Factuality / Misinformation Classifier

Lightweight DeBERTa-v3-Small fine-tuned to detect factual vs. non-factual statements using TruthfulQA and FEVER.
Part of the Army of Safeguards research project

Model Details

Property	Value
Base model	`microsoft/deberta-v3-small`
Architecture	Encoder-only Transformer (≈ 86 M params)
Task	Binary text classification (0 = factual, 1 = non-factual)
Language	English
Fine-tuning framework	Hugging Face Transformers v4.44
Trained by	Ajith Bondili
Hardware	NVIDIA T4 (Google Colab)
Epochs	3
Batch size	16
Learning rate	2e-5
Max sequence len	256 tokens

Training Data

Merged and balanced from two open-source datasets:

TruthfulQA (generation) – Q/A pairs labeled truthful vs false.
FEVER v1.0 – Real-world claims labeled Supported, Refuted, or Not Enough Info (mapped to binary 0/1).

≈ 20 000 combined examples after cleaning.

Evaluation Results

Metric	Base Model (M₀)	Fine-Tuned (M₁)	Δ Change
Accuracy	0.52	0.80	+0.28
F1 Score	0.00	0.79	+0.79
Eval Loss	0.69 → 0.35	↓

Confusion Matrix

	Pred Factual	Pred Non-Factual
True Factual	838	205
True Non-Factual	204	753

Intended Use

Acts as a truth-checking critic for large-language-model outputs.

Input

Free-form English text (e.g., an LLM response or claim)

Output

{
  "label": "non-factual",
  "confidence": 0.81,
  "probs": { "supported": 0.19, "non-factual": 0.81 }
}

Out of Scope

Non-English text
Numerical facts requiring external databases (e.g., live statistics or financial data)
Ethical or opinion-based classification tasks

Bias · Risks · Limitations

Trained only on English corpora; may mis-score culturally specific or multilingual statements.
Can misclassify sarcasm, humor, or figurative speech as “non-factual.”
Should be used as one critic in a multi-agent safeguard system, not as a standalone truth detector.

Usage Example

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch, torch.nn.functional as F

repo = "ajithbondili/deberta-v3-factuality-small"
tok = AutoTokenizer.from_pretrained(repo)
model = AutoModelForSequenceClassification.from_pretrained(repo)

text = "The Moon is made of cheese."
inputs = tok(text, return_tensors="pt", truncation=True, padding=True)
with torch.no_grad():
    logits = model(**inputs).logits
    probs = F.softmax(logits, dim=-1)
label = torch.argmax(probs).item()
print({"label": label, "probs": probs.tolist()})

Citation

@software{bondili_2025_factuality, author = {Ajith Bondili}, title = {DeBERTa-v3-Small Factuality / Misinformation Classifier}, year = {2025}, url = {https://huggingface.co/ajith-bondili/deberta-v3-factuality-small} }

Downloads last month: 3

Safetensors

Model size

0.1B params

Tensor type

F32

Ajith-Bondili
/

deberta-v3-factuality-small