Update README.md

a5fa9a6 verified 5 days ago

3.18 kB

metadata

library_name: transformers
license: apache-2.0
base_model: distilbert-base-uncased
tags:
  - text-classification
  - sentiment-analysis
  - emotion-classification
  - generated_from_trainer
metrics:
  - f1_macro
  - accuracy
model-index:
  - name: SentimentAnalysis-distilbert-base-uncased-finetuned-emotion
    results:
      - task:
          type: text-classification
          name: Emotion Classification
        dataset:
          name: Emotion Dataset
          type: text
        metrics:
          - name: F1 Macro
            type: f1_macro
            value: 0.8902
          - name: Accuracy
            type: accuracy
            value: 0.927

SentimentAnalysis-distilbert-base-uncased-finetuned-emotion

This model is a fine-tuned version of distilbert-base-uncased for emotion classification of short texts (tweets).
It predicts one of six emotions:

sadness
joy
love
anger
fear
surprise

The model was trained using the 🤗 Transformers Trainer API with class-weighted loss to handle class imbalance.

Model performance

Evaluation results on the test set:

Loss: 0.2094
F1 Macro: 0.8902
Accuracy: 0.927

Note: F1 Macro is the primary metric since the dataset is imbalanced.

Intended uses

This model is suitable for:

Emotion analysis of tweets or short social media texts
NLP research and educational projects
Sentiment-aware chatbots or analytics dashboards

Limitations

Trained on short texts (tweets); performance may degrade on long documents
English-only
May inherit biases present in the training data
Not intended for high-stakes or sensitive decision-making

Training data

The model was trained on an emotion-labeled tweet dataset with six emotion classes.
The dataset was split into training, validation, and test sets.

Preprocessing steps included:

Tokenization using the DistilBERT tokenizer
Padding and truncation to a fixed maximum length
Label encoding using Hugging Face ClassLabel

Training procedure

Hyperparameters

Base model: distilbert-base-uncased
Learning rate: 2e-5
Train batch size: 32
Eval batch size: 32
Epochs: 10
Optimizer: AdamW (betas = 0.9, 0.999, epsilon = 1e-8)
Learning rate scheduler: Linear
Loss function: Cross-Entropy with class weights
Seed: 42

Training results

Training Loss	Epoch	Step	Validation Loss	F1 Macro	Accuracy
0.6445	1.0	500	0.2374	0.8993	0.9235
0.1811	2.0	1000	0.1683	0.9109	0.9340
0.1357	3.0	1500	0.1686	0.9157	0.9380
0.1036	4.0	2000	0.1737	0.9192	0.9400
0.0816	5.0	2500	0.2204	0.9086	0.9345
0.0629	6.0	3000	0.2197	0.9142	0.9385
0.0475	7.0	3500	0.3064	0.9081	0.9355

The best model was selected based on macro F1 score.

Framework versions

Transformers: 4.44.2
PyTorch: 2.6.0+cu124
Datasets: 4.4.1
Tokenizers: 0.19.1

Source code

Training and evaluation code is available on GitHub:
👉 https://github.com/Abdelrahmanemam01/Sentiment-Analysis