---
language: en
license: apache-2.0
tags:
- sentiment-analysis
- text-classification
- distilbert
- pytorch
- transformers
datasets:
- imdb
metrics:
- accuracy
- f1
widget:
- text: "This movie was absolutely amazing! Best film I've seen all year!"
  example_title: "Very Positive"
- text: "Pretty good movie, enjoyed it overall."
  example_title: "Slightly Positive"
- text: "It was okay, nothing special but not bad either."
  example_title: "Neutral"
- text: "Not a great movie, pretty disappointing."
  example_title: "Slightly Negative"
- text: "Terrible film, complete waste of time and money!"
  example_title: "Very Negative"
---

# DistilBERT 7-Class Sentiment Analysis Model

A fine-tuned DistilBERT model for nuanced sentiment analysis with 7 sentiment classes on a scale from -3 (Very Negative) to +3 (Very Positive).

## Model Description

This model performs fine-grained sentiment classification, providing more nuanced predictions than traditional binary positive/negative models. It's particularly useful for business applications where understanding the intensity of sentiment matters (e.g., identifying "at-risk" customers vs. extremely dissatisfied ones).

**Architecture:** DistilBERT (distilbert-base-uncased)  
**Parameters:** 66 million  
**Training Data:** 6,000 IMDB movie reviews  
**Accuracy:** 73.7%

## Sentiment Classes

| Class | Scale | Label | Description |
|-------|-------|-------|-------------|
| 0 | -3 | Very Negative | Extremely dissatisfied, angry |
| 1 | -2 | Negative | Clearly unhappy, disappointed |
| 2 | -1 | Slightly Negative | Somewhat disappointed |
| 3 | 0 | Neutral | Balanced, neither positive nor negative |
| 4 | +1 | Slightly Positive | Somewhat satisfied |
| 5 | +2 | Positive | Clearly satisfied, happy |
| 6 | +3 | Very Positive | Extremely satisfied, delighted |

## Usage

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Load model and tokenizer
model_id = "Thi144/sentiment-distilbert-7class"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSequenceClassification.from_pretrained(model_id)

# Class mapping
CLASS_LABELS = {
    0: {"scale": -3, "label": "negative", "name": "Very Negative"},
    1: {"scale": -2, "label": "negative", "name": "Negative"},
    2: {"scale": -1, "label": "negative", "name": "Slightly Negative"},
    3: {"scale": 0, "label": "neutral", "name": "Neutral"},
    4: {"scale": 1, "label": "positive", "name": "Slightly Positive"},
    5: {"scale": 2, "label": "positive", "name": "Positive"},
    6: {"scale": 3, "label": "positive", "name": "Very Positive"}
}

# Predict sentiment
def predict_sentiment(text):
    inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
    
    with torch.no_grad():
        outputs = model(**inputs)
        predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
        class_id = predictions.argmax().item()
        confidence = predictions[0][class_id].item()
    
    result = CLASS_LABELS[class_id]
    return {
        "class": class_id,
        "scale": result["scale"],
        "label": result["label"],
        "name": result["name"],
        "confidence": confidence
    }

# Example
result = predict_sentiment("This movie was absolutely amazing!")
print(f"Sentiment: {result['name']} (Scale: {result['scale']}, Confidence: {result['confidence']:.2%})")
```

## Performance Metrics

**Overall Accuracy:** 73.7%

**Class-Specific Performance:**
- **Very Negative (-3):** 81% precision, 88% recall
- **Negative (-2):** 83% precision, 77% recall  
- **Slightly Negative (-1):** 54% precision, 58% recall
- **Neutral (0):** 86% precision, 64% recall
- **Slightly Positive (+1):** 58% precision, 54% recall
- **Positive (+2):** 79% precision, 83% recall
- **Very Positive (+3):** 88% precision, 81% recall

The model performs best at identifying strong sentiments (Very Negative/Positive) and struggles most with subtle distinctions (Slightly Negative/Positive).

## Training Details

- **Base Model:** distilbert-base-uncased
- **Dataset:** 6,000 IMDB reviews (4,800 train, 1,200 test)
- **Label Conversion:** Binary labels converted to 7-class using text intensity analysis
- **Epochs:** 4
- **Batch Size:** 16
- **Optimizer:** AdamW (lr=2e-5)
- **Training Time:** ~15-20 minutes on CPU

## Limitations

- Trained on movie reviews, may not generalize perfectly to other domains
- Slightly Negative/Positive classes have lower accuracy (~54-58%)
- Performance depends on text clarity and length
- May struggle with sarcasm or complex sentiment

## Intended Use

**Primary Use Cases:**
- Customer feedback analysis with nuanced sentiment scoring
- Product review sentiment classification
- Social media monitoring with intensity detection
- Business intelligence dashboards requiring granular sentiment

**Not Recommended For:**
- Safety-critical applications
- Legal decision-making
- Medical diagnosis

## License

Apache 2.0

## Citation

If you use this model, please cite:

```
@model{thi144-sentiment-distilbert-7class,
  author = {Thi144},
  title = {DistilBERT 7-Class Sentiment Analysis},
  year = {2025},
  publisher = {HuggingFace},
  url = {https://huggingface.co/Thi144/sentiment-distilbert-7class}
}
```