You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

DeBERTa IMDB Sentiment Analysis

Fine-tuned DeBERTa-v3-base model for binary sentiment classification on IMDB movie reviews, achieving 96% accuracy and demonstrating the superiority of transfer learning over traditional approaches.

Model Description

This model classifies movie reviews as positive or negative sentiment. It was fine-tuned as part of a comparative study demonstrating that modern transformer-based transfer learning significantly outperforms traditional deep learning approaches.

Performance Comparison:

DeBERTa (this model): 96% accuracy
LSTM baseline: 89% accuracy
Improvement: +7 percentage points

Quick Start

from transformers import pipeline

classifier = pipeline(
    "sentiment-analysis",
    model="radwa-f/DeBERTA-Imdb-SentimentAnalysis"
)

result = classifier("This movie was absolutely brilliant!")
print(result)
# [{'label': 'POSITIVE', 'score': 0.9987}]

result = classifier("Worst movie I've ever seen. Complete waste of time.")
print(result)
# [{'label': 'NEGATIVE', 'score': 0.9991}]

Detailed Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

tokenizer = AutoTokenizer.from_pretrained("radwa-f/DeBERTA-Imdb-SentimentAnalysis")
model = AutoModelForSequenceClassification.from_pretrained("radwa-f/DeBERTA-Imdb-SentimentAnalysis")

text = "The cinematography was stunning and the story was captivating!"
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)

with torch.no_grad():
    outputs = model(**inputs)
    predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
    
predicted_class = predictions.argmax().item()
confidence = predictions[0][predicted_class].item()

labels = ["NEGATIVE", "POSITIVE"]
print(f"Prediction: {labels[predicted_class]} (confidence: {confidence:.2%})")

Training Details

Dataset

Source: IMDB Movie Reviews Dataset
Size: 50,000 reviews (25,000 train, 25,000 test)
Classes: Binary (Positive/Negative)
Balance: 50/50 split between positive and negative reviews
Average Length: ~250 words per review

Base Model

Architecture: DeBERTa-v3-base (microsoft/deberta-v3-base)
Parameters: ~184M parameters
Tokenizer: DeBERTa tokenizer with 128K vocabulary

Training Procedure

Hyperparameters:

Optimizer: AdamW
Learning Rate: 2e-5
Weight Decay: 0.01
Batch Size: 16
Epochs: 3
Max Sequence Length: 512 tokens
Warmup Steps: 500
Scheduler: Linear with warmup

Hardware:

GPU: Google Colab A100 GPU
Training Time: 33 min

Framework:

PyTorch 2.0+
Hugging Face Transformers 4.30+
CUDA 11.8

Training Data Processing

Text cleaning and normalization
HTML tag removal
Tokenization with DeBERTa tokenizer
Padding/truncation to 512 tokens
80/20 train/validation split from training set

Evaluation

Test Set Performance

Metric	Score
Accuracy	96.0%
F1-Score	0.96
Precision	0.96
Recall	0.96

Comparison Study

This model was developed as part of a comparative analysis studying transfer learning vs traditional approaches:

Model	Architecture	Accuracy	Training Time
DeBERTa (this)	Transformer + Transfer Learning	96%	~2 hours
LSTM Baseline	Recurrent Neural Network	89%	~4 hours

Key Finding: Transfer learning with pre-trained transformers significantly outperforms traditional deep learning approaches trained from scratch.

Confusion Matrix Analysis

True Positives: High precision on positive reviews
True Negatives: High precision on negative reviews
Misclassifications: Mostly occur with sarcastic or nuanced reviews

Intended Uses

Primary Use Cases

Movie review sentiment analysis
Educational demonstrations of transfer learning
Baseline model for sentiment classification research
Product review analysis (similar domains)

Downstream Applications

Automated review aggregation
Content recommendation systems
Market research and opinion mining
Customer feedback analysis

Limitations

Known Limitations:

Trained specifically on movie reviews; may not generalize perfectly to other domains
Struggles with sarcasm and highly nuanced sentiment
Maximum input length of 512 tokens (longer reviews are truncated)
English language only
May reflect biases present in IMDB review dataset
Performance may degrade on reviews from different time periods or cultures

Not Suitable For:

Real-time streaming applications (inference time ~100ms per review)
Non-English text
Highly domain-specific jargon outside entertainment
Multi-class sentiment (only binary: positive/negative)

Bias and Ethical Considerations

Potential Biases:

IMDB dataset may over-represent certain demographics and film genres
Model may perform differently on independent/international films
Temporal bias: trained on historical reviews, may not capture evolving language
May inherit biases from pre-training corpus

Responsible Use:

Should not be used as sole basis for critical decisions
Human review recommended for ambiguous cases
Be aware of domain adaptation limitations
Consider fairness implications when deploying

Model Creators

Radwa Fattouhi

Final-year Software Engineering Student
École Nationale des Sciences Appliquées (ENSA), El Jadida, Morocco
Email: [email protected]
LinkedIn: radwa-fattouhi
GitHub: radwa-f

Amine Boktaya

Final-year Software Engineering Student
École Nationale des Sciences Appliquées (ENSA), El Jadida, Morocco
Email: [email protected]
LinkedIn: amine-boktaya
GitHub: BoktayaAmine

Citation

If you use this model in your research, please cite:

@misc{fattouhi2025imdb,
  author = {Fattouhi, Radwa and Boktaya, Amine},
  title = {DeBERTa IMDB Sentiment Analysis: Transfer Learning vs Traditional Approaches},
  year = {2025},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/radwa-f/DeBERTA-Imdb-SentimentAnalysis}}
}

Related Work

Published Research:

AgriAlertX: Climate-driven disaster prevention for agriculture
Journal: SoftwareX (Elsevier)
DOI: 10.1016/j.softx.2025.102350

Related Projects:

Riot Detection System (DeBERTa for social media classification)
Tweet Detoxification (BART for style transfer)

Acknowledgments

Base model: Microsoft Research for DeBERTa-v3
Dataset: IMDB for the movie review dataset
Framework: Hugging Face Transformers team

Model card last updated: December 2025

Downloads last month: 30

Safetensors

Model size

0.2B params

Tensor type

F32

Model tree for radwa-f/DeBERTA-Imdb-SentimentAnalysis

Base model

microsoft/deberta-v3-base

Finetuned

(491)

this model

radwa-f
/

DeBERTA-Imdb-SentimentAnalysis