Update README.md with comprehensive dual-dataset evaluation

Major Updates:
- Add VLSP2016 general sentiment dataset to supported datasets and model-index
- Update title from "Banking Aspect Sentiment" to "Vietnamese Sentiment Analysis System"
- Add comprehensive performance metrics for both datasets:
* VLSP2016: 71.14% (SVC), 70.19% (LR) with balanced per-class performance
* UTS2017_Bank: 71.72% (SVC), 68.18% (LR) with detailed aspect-sentiment analysis

Enhanced Documentation:
- Dataset selection examples with --dataset vlsp2016|uts2017 parameter
- Dual-model usage examples for general vs banking sentiment analysis
- Cross-dataset performance analysis and insights
- N-gram comparison results (bigrams vs trigrams)

New Features Documented:
- Clean.py utility for managing training runs
- Project management section with cleanup workflows
- Updated model export naming with dataset prefixes
- Enhanced ethical considerations and limitations

Performance Insights:
- Consistent ~71% accuracy across 3-class and 35-class tasks
- Balanced datasets (VLSP2016) provide equitable per-class performance
- Imbalanced datasets (UTS2017_Bank) show performance variations
- Bigrams optimal for Vietnamese sentiment analysis

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

Files changed (1) hide show

README.md +164 -44

README.md CHANGED Viewed

@@ -17,6 +17,7 @@ tags:
   - financial-nlp
 datasets:
   - undertheseanlp/UTS2017_Bank
 metrics:
   - accuracy
   - precision
@@ -25,6 +26,25 @@ metrics:
 model-index:
   - name: pulse-core-1
     results:
       - task:
           type: text-classification
           name: Vietnamese Banking Aspect Sentiment Analysis
@@ -55,15 +75,15 @@ language:
 pipeline_tag: text-classification
 ---
-# Pulse Core 1 - Vietnamese Banking Aspect Sentiment Analysis
-A machine learning-based aspect sentiment analysis model designed for Vietnamese banking text processing. Built on TF-IDF feature extraction pipeline combined with various machine learning algorithms, achieving **71.72% accuracy** on UTS2017_Bank aspect sentiment dataset with Support Vector Classification (SVC).
 📋 **[View Detailed System Card](https://huggingface.co/undertheseanlp/pulse_core_1/blob/main/paper/pulse_core_1_technical_report.tex)** for comprehensive model documentation, performance analysis, and limitations.
 ## Model Description
-**Pulse Core 1** is a Vietnamese banking aspect sentiment analysis model that analyzes both the aspect (what the text is about) and sentiment (positive/negative/neutral) of Vietnamese banking-related text. The model is specifically designed for Vietnamese banking customer feedback analysis, banking service categorization, and sentiment analysis for Vietnamese financial services.
 ### Model Architecture
@@ -75,7 +95,20 @@ A machine learning-based aspect sentiment analysis model designed for Vietnamese
 - **Framework**: scikit-learn ≥1.6
 - **Caching System**: Hash-based caching for efficient processing
-## Supported Dataset & Categories
 ### UTS2017_Bank Dataset - Banking Aspect Sentiment (35 combined classes)
@@ -115,22 +148,40 @@ pip install scikit-learn>=1.6 joblib
 ### Training the Model
-#### UTS2017_Bank Dataset (Banking Aspect Sentiment Analysis)
 ```bash
-# Default training with Logistic Regression
-python train.py --model logistic
 # Train with SVC for better performance
-python train.py --model svc_linear
 # With specific parameters
-python train.py --model logistic --max-features 20000 --ngram-min 1 --ngram-max 2
 # Export model for deployment
-python train.py --model logistic --export-model
-# Compare multiple models
-python train.py --compare-models logistic svc_linear
 ```
 ### Training from Scratch
@@ -138,8 +189,19 @@ python train.py --compare-models logistic svc_linear
 ```python
 from train import train_notebook
 # Train UTS2017_Bank aspect sentiment model
 results = train_notebook(
     model_name="logistic",
     max_features=20000,
     ngram_min=1,
@@ -147,28 +209,48 @@ results = train_notebook(
     export_model=True
 )
-# Compare multiple models
 comparison_results = train_notebook(
     compare=True
 )
 ```
 ## Performance Metrics
 ### UTS2017_Bank Aspect Sentiment Analysis Performance
-- **Training Accuracy**: 94.31%
-- **Test Accuracy**: 71.72%
 - **Training Samples**: 1,581
 - **Test Samples**: 396
 - **Number of Classes**: 35 aspect-sentiment combinations
-- **Training Time**: ~7.71 seconds
 - **Best Performing Classes**:
   - `TRADEMARK#positive`: 90% F1-score
   - `CUSTOMER_SUPPORT#positive`: 88% F1-score
-  - `LOAN#negative`: 67% F1-score
   - `CUSTOMER_SUPPORT#negative`: 65% F1-score
 - **Challenges**: Class imbalance affects minority aspect-sentiment combinations
-- **Model Type**: Support Vector Classification (SVC) with TF-IDF (20k features, 1-2 ngrams)
 ## Using the Pre-trained Models
@@ -177,29 +259,37 @@ comparison_results = train_notebook(
 ```python
 import joblib
-# Load local exported model
-sentiment_model = joblib.load("uts2017_sentiment_20250928_131716.joblib")
 # Or use inference script directly
 from inference import predict_text
-# Make prediction on banking text
-bank_text = "Tôi muốn mở tài khoản tiết kiệm"
-prediction, confidence, top_predictions = predict_text(sentiment_model, bank_text)
-print(f"Aspect-Sentiment: {prediction}")
 print(f"Confidence: {confidence:.3f}")
 print("Top 3 predictions:")
 for i, (category, prob) in enumerate(top_predictions, 1):
     print(f"  {i}. {category}: {prob:.3f}")
-# Example output:
-# Aspect-Sentiment: CUSTOMER_SUPPORT#negative
-# Confidence: 0.301
 # Top 3 predictions:
-#   1. CUSTOMER_SUPPORT#negative: 0.301
-#   2. TRADEMARK#positive: 0.187
-#   3. CUSTOMER_SUPPORT#positive: 0.095
 ```
 ### Using the Inference Script
@@ -221,30 +311,59 @@ python inference.py --list-models
 ## Model Parameters
 - `model`: Model type ("logistic", "svc_linear", "svc_rbf", "naive_bayes", "decision_tree", "random_forest", etc.)
 - `max_features`: Maximum number of TF-IDF features (default: 20000)
-- `ngram_min/max`: N-gram range (default: 1-2)
-- `split_ratio`: Train/test split ratio (default: 0.2)
 - `n_samples`: Optional sample limit for quick testing
-- `export_model`: Export model for deployment (creates `uts2017_sentiment_<timestamp>.joblib`)
 ## Limitations
 1. **Language Specificity**: Only works with Vietnamese text
-2. **Domain Specificity**: Optimized specifically for Vietnamese banking domain
 3. **Feature Limitations**: Limited to 20,000 most frequent features
-4. **Class Imbalance Sensitivity**: Performance degrades significantly with imbalanced aspect-sentiment combinations
 5. **Specific Weaknesses**:
-   - Poor performance on minority aspect-sentiment classes due to insufficient training data
-   - Limited to banking domain aspects (account, loan, card, etc.)
-   - Sentiment analysis accuracy varies by aspect type
 ## Ethical Considerations
-- Model reflects biases present in training datasets
-- Performance varies significantly across categories
-- Should be validated on target domain before deployment
-- Consider class imbalance when interpreting results
 ## Citation
@@ -252,8 +371,9 @@ If you use this model, please cite:
 ```bibtex
 @misc{undertheseanlp_2025,
-    author       = { undertheseanlp },
-    title        = { Pulse Core 1 - Vietnamese Text Classification Model },
     year         = 2025,
     url          = { https://huggingface.co/undertheseanlp/pulse_core_1 },
     doi          = { 10.57967/hf/6605 },

   - financial-nlp
 datasets:
   - undertheseanlp/UTS2017_Bank
+  - ura-hcmut/vlsp2016
 metrics:
   - accuracy
   - precision
 model-index:
   - name: pulse-core-1
     results:
+      - task:
+          type: text-classification
+          name: Vietnamese General Sentiment Analysis
+        dataset:
+          name: VLSP2016
+          type: ura-hcmut/vlsp2016
+        metrics:
+          - type: accuracy
+            value: 0.7114
+            name: Test Accuracy (SVC Linear)
+          - type: accuracy
+            value: 0.7019
+            name: Test Accuracy (Logistic Regression)
+          - type: f1-score
+            value: 0.713
+            name: Weighted F1-Score (SVC)
+          - type: f1-score
+            value: 0.703
+            name: Weighted F1-Score (Logistic Regression)
       - task:
           type: text-classification
           name: Vietnamese Banking Aspect Sentiment Analysis
 pipeline_tag: text-classification
 ---
+# Pulse Core 1 - Vietnamese Sentiment Analysis System
+A comprehensive machine learning-based sentiment analysis system for Vietnamese text processing. Built on TF-IDF feature extraction pipeline combined with various machine learning algorithms, achieving **71.14% accuracy** on VLSP2016 general sentiment dataset and **71.72% accuracy** on UTS2017_Bank banking aspect sentiment dataset with Support Vector Classification (SVC).
 📋 **[View Detailed System Card](https://huggingface.co/undertheseanlp/pulse_core_1/blob/main/paper/pulse_core_1_technical_report.tex)** for comprehensive model documentation, performance analysis, and limitations.
 ## Model Description
+**Pulse Core 1** is a versatile Vietnamese sentiment analysis system that supports both general sentiment classification and specialized banking aspect sentiment analysis. The system can analyze general Vietnamese text sentiment (positive/negative/neutral) and banking-specific aspect sentiment (combining banking aspects with sentiment polarities). It's designed for Vietnamese text analysis across multiple domains, with specialized capabilities for banking customer feedback analysis and financial service categorization.
 ### Model Architecture
 - **Framework**: scikit-learn ≥1.6
 - **Caching System**: Hash-based caching for efficient processing
+## Supported Datasets & Categories
+### VLSP2016 Dataset - General Sentiment Analysis (3 classes)
+**Sentiment Categories:**
+- **positive** - Positive sentiment towards products/services
+- **negative** - Negative sentiment towards products/services
+- **neutral** - Neutral or mixed sentiment
+**Dataset Statistics:**
+- Training samples: 5,100 (1,700 per class)
+- Test samples: 1,050 (350 per class)
+- Balanced distribution across all sentiment classes
+- Domain: General product and service reviews
 ### UTS2017_Bank Dataset - Banking Aspect Sentiment (35 combined classes)
 ### Training the Model
+#### Dataset Selection and Training
+**VLSP2016 Dataset (General Sentiment Analysis):**
 ```bash
+# Train on VLSP2016 with Logistic Regression
+python train.py --dataset vlsp2016 --model logistic
 # Train with SVC for better performance
+python train.py --dataset vlsp2016 --model svc_linear
+# Compare n-gram ranges
+python train.py --dataset vlsp2016 --model svc_linear --ngram-min 1 --ngram-max 2
+python train.py --dataset vlsp2016 --model svc_linear --ngram-min 1 --ngram-max 3
+# Export model for deployment
+python train.py --dataset vlsp2016 --model svc_linear --export-model
+```
+**UTS2017_Bank Dataset (Banking Aspect Sentiment Analysis):**
+```bash
+# Train on UTS2017_Bank (default dataset)
+python train.py --dataset uts2017 --model logistic
+# Train with SVC for better performance
+python train.py --dataset uts2017 --model svc_linear
 # With specific parameters
+python train.py --dataset uts2017 --model logistic --max-features 20000 --ngram-min 1 --ngram-max 2
 # Export model for deployment
+python train.py --dataset uts2017 --model logistic --export-model
+# Compare multiple models on specific dataset
+python train.py --dataset vlsp2016 --compare-models logistic svc_linear
 ```
 ### Training from Scratch
 ```python
 from train import train_notebook
+# Train VLSP2016 general sentiment model
+results = train_notebook(
+    dataset="vlsp2016",
+    model_name="svc_linear",
+    max_features=20000,
+    ngram_min=1,
+    ngram_max=2,
+    export_model=True
+)
 # Train UTS2017_Bank aspect sentiment model
 results = train_notebook(
+    dataset="uts2017",
     model_name="logistic",
     max_features=20000,
     ngram_min=1,
     export_model=True
 )
+# Compare multiple models on VLSP2016
 comparison_results = train_notebook(
+    dataset="vlsp2016",
     compare=True
 )
 ```
 ## Performance Metrics
+### VLSP2016 General Sentiment Analysis Performance
+- **Training Accuracy**: 94.57% (SVC Linear)
+- **Test Accuracy**: 71.14% (SVC Linear, 1-2 ngram) / 70.67% (SVC Linear, 1-3 ngram) / 70.19% (Logistic Regression)
+- **Training Samples**: 5,100 (balanced: 1,700 per class)
+- **Test Samples**: 1,050 (balanced: 350 per class)
+- **Number of Classes**: 3 sentiment polarities
+- **Training Time**: ~24.95 seconds (SVC) / 0.75 seconds (LR)
+- **Per-Class Performance (SVC Linear)**:
+  - **Positive**: 80% precision, 72% recall, 76% F1-score
+  - **Negative**: 70% precision, 72% recall, 71% F1-score
+  - **Neutral**: 65% precision, 69% recall, 67% F1-score
+- **Key Insights**: Consistent performance across all sentiment classes due to balanced dataset
+- **Optimal N-gram**: Bigrams (1-2) outperform trigrams (1-3) by 0.47 percentage points
 ### UTS2017_Bank Aspect Sentiment Analysis Performance
+- **Training Accuracy**: 94.57% (SVC)
+- **Test Accuracy**: 71.72% (SVC) / 68.18% (Logistic Regression)
 - **Training Samples**: 1,581
 - **Test Samples**: 396
 - **Number of Classes**: 35 aspect-sentiment combinations
+- **Training Time**: ~5.3 seconds (SVC) / 2.13 seconds (LR)
 - **Best Performing Classes**:
   - `TRADEMARK#positive`: 90% F1-score
   - `CUSTOMER_SUPPORT#positive`: 88% F1-score
+  - `LOAN#negative`: 67% F1-score (SVC improvement over LR)
   - `CUSTOMER_SUPPORT#negative`: 65% F1-score
 - **Challenges**: Class imbalance affects minority aspect-sentiment combinations
+- **Key Finding**: SVC shows superior category diversity compared to Logistic Regression
+### Cross-Dataset Performance Analysis
+- **Consistent SVC Performance**: ~71% accuracy on both 3-class (VLSP2016) and 35-class (UTS2017_Bank) tasks
+- **Balance Impact**: Balanced datasets (VLSP2016) yield consistent per-class results while imbalanced datasets create performance variations
+- **Training Efficiency**: Larger balanced datasets require more training time but provide stable results
 ## Using the Pre-trained Models
 ```python
 import joblib
+# Load VLSP2016 general sentiment model
+general_model = joblib.load("vlsp2016_sentiment_20250929_075529.joblib")
+# Load UTS2017_Bank aspect sentiment model
+banking_model = joblib.load("uts2017_sentiment_20250928_131716.joblib")
 # Or use inference script directly
 from inference import predict_text
+# General sentiment analysis
+general_text = "Sản phẩm này rất tốt, tôi rất hài lòng"
+prediction, confidence, top_predictions = predict_text(general_model, general_text)
+print(f"General Sentiment: {prediction}")  # Expected: positive
+# Banking aspect sentiment analysis
+bank_text = "Lãi suất vay mua nhà hiện tại quá cao"
+prediction, confidence, top_predictions = predict_text(banking_model, bank_text)
+print(f"Banking Aspect-Sentiment: {prediction}")  # Expected: INTEREST_RATE#negative
 print(f"Confidence: {confidence:.3f}")
 print("Top 3 predictions:")
 for i, (category, prob) in enumerate(top_predictions, 1):
     print(f"  {i}. {category}: {prob:.3f}")
+# Example output for banking text:
+# Banking Aspect-Sentiment: INTEREST_RATE#negative
+# Confidence: 0.509
 # Top 3 predictions:
+#   1. INTEREST_RATE#negative: 0.509
+#   2. LOAN#negative: 0.218
+#   3. CUSTOMER_SUPPORT#negative: 0.095
 ```
 ### Using the Inference Script
 ## Model Parameters
+- `dataset`: Dataset selection ("vlsp2016" for general sentiment, "uts2017" for banking aspect sentiment)
 - `model`: Model type ("logistic", "svc_linear", "svc_rbf", "naive_bayes", "decision_tree", "random_forest", etc.)
 - `max_features`: Maximum number of TF-IDF features (default: 20000)
+- `ngram_min/max`: N-gram range (default: 1-2, optimal for Vietnamese)
+- `split_ratio`: Train/test split ratio (default: 0.2, only used for uts2017)
 - `n_samples`: Optional sample limit for quick testing
+- `export_model`: Export model for deployment (creates `<dataset>_sentiment_<timestamp>.joblib`)
+- `compare`: Compare multiple model configurations
+- `compare_models`: Specify models to compare
+## Project Management
+### Cleanup Utility
+The project includes a cleanup script to manage training runs:
+```bash
+# Preview runs that will be deleted (without exported models)
+uv run python clean.py --dry-run --verbose
+# Clean up runs without exported models
+uv run python clean.py --yes
+# Interactive cleanup with confirmation
+uv run python clean.py
+```
+**Features:**
+- Automatically identifies runs without exported model files
+- Shows space that will be freed
+- Dry-run mode for safe previewing
+- Detailed information about each run
+- Preserves runs with exported models
 ## Limitations
 1. **Language Specificity**: Only works with Vietnamese text
+2. **Domain Coverage**: Two specialized domains (general sentiment + banking aspect sentiment)
 3. **Feature Limitations**: Limited to 20,000 most frequent features
+4. **Class Imbalance Sensitivity**: Performance degrades significantly with imbalanced datasets (evident in UTS2017_Bank)
 5. **Specific Weaknesses**:
+   - **VLSP2016**: Minor performance variation between sentiment classes
+   - **UTS2017_Bank**: Poor performance on minority aspect-sentiment classes due to insufficient training data
+   - **N-gram Limitation**: Trigrams provide minimal improvement over bigrams while increasing computational cost
+   - Banking domain aspects limited to predefined categories (account, loan, card, etc.)
 ## Ethical Considerations
+- **Dataset Bias**: Models reflect biases present in training datasets (VLSP2016 general reviews, UTS2017_Bank banking feedback)
+- **Performance Variation**: Significant performance differences between balanced (VLSP2016) and imbalanced (UTS2017_Bank) datasets
+- **Domain Validation**: Should be validated on target domain before deployment
+- **Class Imbalance**: Consider dataset balance when interpreting results, especially for banking aspect sentiment
+- **Representation**: VLSP2016 provides more equitable performance across sentiment classes due to balanced training data
 ## Citation
 ```bibtex
 @misc{undertheseanlp_2025,
+    author       = { Vu Anh },
+    organization = { UnderTheSea NLP },
+    title        = { Pulse Core 1 - Vietnamese Sentiment Analysis System },
     year         = 2025,
     url          = { https://huggingface.co/undertheseanlp/pulse_core_1 },
     doi          = { 10.57967/hf/6605 },