Vu Anh Claude commited on
Commit
25aa0d2
·
1 Parent(s): 08bbb4c

Update README.md with comprehensive dual-dataset evaluation

Browse files

Major Updates:
- Add VLSP2016 general sentiment dataset to supported datasets and model-index
- Update title from "Banking Aspect Sentiment" to "Vietnamese Sentiment Analysis System"
- Add comprehensive performance metrics for both datasets:
* VLSP2016: 71.14% (SVC), 70.19% (LR) with balanced per-class performance
* UTS2017_Bank: 71.72% (SVC), 68.18% (LR) with detailed aspect-sentiment analysis

Enhanced Documentation:
- Dataset selection examples with --dataset vlsp2016|uts2017 parameter
- Dual-model usage examples for general vs banking sentiment analysis
- Cross-dataset performance analysis and insights
- N-gram comparison results (bigrams vs trigrams)

New Features Documented:
- Clean.py utility for managing training runs
- Project management section with cleanup workflows
- Updated model export naming with dataset prefixes
- Enhanced ethical considerations and limitations

Performance Insights:
- Consistent ~71% accuracy across 3-class and 35-class tasks
- Balanced datasets (VLSP2016) provide equitable per-class performance
- Imbalanced datasets (UTS2017_Bank) show performance variations
- Bigrams optimal for Vietnamese sentiment analysis

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

Files changed (1) hide show
  1. README.md +164 -44
README.md CHANGED
@@ -17,6 +17,7 @@ tags:
17
  - financial-nlp
18
  datasets:
19
  - undertheseanlp/UTS2017_Bank
 
20
  metrics:
21
  - accuracy
22
  - precision
@@ -25,6 +26,25 @@ metrics:
25
  model-index:
26
  - name: pulse-core-1
27
  results:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
28
  - task:
29
  type: text-classification
30
  name: Vietnamese Banking Aspect Sentiment Analysis
@@ -55,15 +75,15 @@ language:
55
  pipeline_tag: text-classification
56
  ---
57
 
58
- # Pulse Core 1 - Vietnamese Banking Aspect Sentiment Analysis
59
 
60
- A machine learning-based aspect sentiment analysis model designed for Vietnamese banking text processing. Built on TF-IDF feature extraction pipeline combined with various machine learning algorithms, achieving **71.72% accuracy** on UTS2017_Bank aspect sentiment dataset with Support Vector Classification (SVC).
61
 
62
  📋 **[View Detailed System Card](https://huggingface.co/undertheseanlp/pulse_core_1/blob/main/paper/pulse_core_1_technical_report.tex)** for comprehensive model documentation, performance analysis, and limitations.
63
 
64
  ## Model Description
65
 
66
- **Pulse Core 1** is a Vietnamese banking aspect sentiment analysis model that analyzes both the aspect (what the text is about) and sentiment (positive/negative/neutral) of Vietnamese banking-related text. The model is specifically designed for Vietnamese banking customer feedback analysis, banking service categorization, and sentiment analysis for Vietnamese financial services.
67
 
68
  ### Model Architecture
69
 
@@ -75,7 +95,20 @@ A machine learning-based aspect sentiment analysis model designed for Vietnamese
75
  - **Framework**: scikit-learn ≥1.6
76
  - **Caching System**: Hash-based caching for efficient processing
77
 
78
- ## Supported Dataset & Categories
 
 
 
 
 
 
 
 
 
 
 
 
 
79
 
80
  ### UTS2017_Bank Dataset - Banking Aspect Sentiment (35 combined classes)
81
 
@@ -115,22 +148,40 @@ pip install scikit-learn>=1.6 joblib
115
 
116
  ### Training the Model
117
 
118
- #### UTS2017_Bank Dataset (Banking Aspect Sentiment Analysis)
 
 
119
  ```bash
120
- # Default training with Logistic Regression
121
- python train.py --model logistic
122
 
123
  # Train with SVC for better performance
124
- python train.py --model svc_linear
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
125
 
126
  # With specific parameters
127
- python train.py --model logistic --max-features 20000 --ngram-min 1 --ngram-max 2
128
 
129
  # Export model for deployment
130
- python train.py --model logistic --export-model
131
 
132
- # Compare multiple models
133
- python train.py --compare-models logistic svc_linear
134
  ```
135
 
136
  ### Training from Scratch
@@ -138,8 +189,19 @@ python train.py --compare-models logistic svc_linear
138
  ```python
139
  from train import train_notebook
140
 
 
 
 
 
 
 
 
 
 
 
141
  # Train UTS2017_Bank aspect sentiment model
142
  results = train_notebook(
 
143
  model_name="logistic",
144
  max_features=20000,
145
  ngram_min=1,
@@ -147,28 +209,48 @@ results = train_notebook(
147
  export_model=True
148
  )
149
 
150
- # Compare multiple models
151
  comparison_results = train_notebook(
 
152
  compare=True
153
  )
154
  ```
155
 
156
  ## Performance Metrics
157
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
158
  ### UTS2017_Bank Aspect Sentiment Analysis Performance
159
- - **Training Accuracy**: 94.31%
160
- - **Test Accuracy**: 71.72%
161
  - **Training Samples**: 1,581
162
  - **Test Samples**: 396
163
  - **Number of Classes**: 35 aspect-sentiment combinations
164
- - **Training Time**: ~7.71 seconds
165
  - **Best Performing Classes**:
166
  - `TRADEMARK#positive`: 90% F1-score
167
  - `CUSTOMER_SUPPORT#positive`: 88% F1-score
168
- - `LOAN#negative`: 67% F1-score
169
  - `CUSTOMER_SUPPORT#negative`: 65% F1-score
170
  - **Challenges**: Class imbalance affects minority aspect-sentiment combinations
171
- - **Model Type**: Support Vector Classification (SVC) with TF-IDF (20k features, 1-2 ngrams)
 
 
 
 
 
172
 
173
  ## Using the Pre-trained Models
174
 
@@ -177,29 +259,37 @@ comparison_results = train_notebook(
177
  ```python
178
  import joblib
179
 
180
- # Load local exported model
181
- sentiment_model = joblib.load("uts2017_sentiment_20250928_131716.joblib")
 
 
 
182
 
183
  # Or use inference script directly
184
  from inference import predict_text
185
 
186
- # Make prediction on banking text
187
- bank_text = "Tôi muốn mở tài khoản tiết kiệm"
188
- prediction, confidence, top_predictions = predict_text(sentiment_model, bank_text)
 
 
 
 
 
 
189
 
190
- print(f"Aspect-Sentiment: {prediction}")
191
  print(f"Confidence: {confidence:.3f}")
192
  print("Top 3 predictions:")
193
  for i, (category, prob) in enumerate(top_predictions, 1):
194
  print(f" {i}. {category}: {prob:.3f}")
195
 
196
- # Example output:
197
- # Aspect-Sentiment: CUSTOMER_SUPPORT#negative
198
- # Confidence: 0.301
199
  # Top 3 predictions:
200
- # 1. CUSTOMER_SUPPORT#negative: 0.301
201
- # 2. TRADEMARK#positive: 0.187
202
- # 3. CUSTOMER_SUPPORT#positive: 0.095
203
  ```
204
 
205
  ### Using the Inference Script
@@ -221,30 +311,59 @@ python inference.py --list-models
221
 
222
  ## Model Parameters
223
 
 
224
  - `model`: Model type ("logistic", "svc_linear", "svc_rbf", "naive_bayes", "decision_tree", "random_forest", etc.)
225
  - `max_features`: Maximum number of TF-IDF features (default: 20000)
226
- - `ngram_min/max`: N-gram range (default: 1-2)
227
- - `split_ratio`: Train/test split ratio (default: 0.2)
228
  - `n_samples`: Optional sample limit for quick testing
229
- - `export_model`: Export model for deployment (creates `uts2017_sentiment_<timestamp>.joblib`)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
230
 
231
  ## Limitations
232
 
233
  1. **Language Specificity**: Only works with Vietnamese text
234
- 2. **Domain Specificity**: Optimized specifically for Vietnamese banking domain
235
  3. **Feature Limitations**: Limited to 20,000 most frequent features
236
- 4. **Class Imbalance Sensitivity**: Performance degrades significantly with imbalanced aspect-sentiment combinations
237
  5. **Specific Weaknesses**:
238
- - Poor performance on minority aspect-sentiment classes due to insufficient training data
239
- - Limited to banking domain aspects (account, loan, card, etc.)
240
- - Sentiment analysis accuracy varies by aspect type
 
241
 
242
  ## Ethical Considerations
243
 
244
- - Model reflects biases present in training datasets
245
- - Performance varies significantly across categories
246
- - Should be validated on target domain before deployment
247
- - Consider class imbalance when interpreting results
 
248
 
249
  ## Citation
250
 
@@ -252,8 +371,9 @@ If you use this model, please cite:
252
 
253
  ```bibtex
254
  @misc{undertheseanlp_2025,
255
- author = { undertheseanlp },
256
- title = { Pulse Core 1 - Vietnamese Text Classification Model },
 
257
  year = 2025,
258
  url = { https://huggingface.co/undertheseanlp/pulse_core_1 },
259
  doi = { 10.57967/hf/6605 },
 
17
  - financial-nlp
18
  datasets:
19
  - undertheseanlp/UTS2017_Bank
20
+ - ura-hcmut/vlsp2016
21
  metrics:
22
  - accuracy
23
  - precision
 
26
  model-index:
27
  - name: pulse-core-1
28
  results:
29
+ - task:
30
+ type: text-classification
31
+ name: Vietnamese General Sentiment Analysis
32
+ dataset:
33
+ name: VLSP2016
34
+ type: ura-hcmut/vlsp2016
35
+ metrics:
36
+ - type: accuracy
37
+ value: 0.7114
38
+ name: Test Accuracy (SVC Linear)
39
+ - type: accuracy
40
+ value: 0.7019
41
+ name: Test Accuracy (Logistic Regression)
42
+ - type: f1-score
43
+ value: 0.713
44
+ name: Weighted F1-Score (SVC)
45
+ - type: f1-score
46
+ value: 0.703
47
+ name: Weighted F1-Score (Logistic Regression)
48
  - task:
49
  type: text-classification
50
  name: Vietnamese Banking Aspect Sentiment Analysis
 
75
  pipeline_tag: text-classification
76
  ---
77
 
78
+ # Pulse Core 1 - Vietnamese Sentiment Analysis System
79
 
80
+ A comprehensive machine learning-based sentiment analysis system for Vietnamese text processing. Built on TF-IDF feature extraction pipeline combined with various machine learning algorithms, achieving **71.14% accuracy** on VLSP2016 general sentiment dataset and **71.72% accuracy** on UTS2017_Bank banking aspect sentiment dataset with Support Vector Classification (SVC).
81
 
82
  📋 **[View Detailed System Card](https://huggingface.co/undertheseanlp/pulse_core_1/blob/main/paper/pulse_core_1_technical_report.tex)** for comprehensive model documentation, performance analysis, and limitations.
83
 
84
  ## Model Description
85
 
86
+ **Pulse Core 1** is a versatile Vietnamese sentiment analysis system that supports both general sentiment classification and specialized banking aspect sentiment analysis. The system can analyze general Vietnamese text sentiment (positive/negative/neutral) and banking-specific aspect sentiment (combining banking aspects with sentiment polarities). It's designed for Vietnamese text analysis across multiple domains, with specialized capabilities for banking customer feedback analysis and financial service categorization.
87
 
88
  ### Model Architecture
89
 
 
95
  - **Framework**: scikit-learn ≥1.6
96
  - **Caching System**: Hash-based caching for efficient processing
97
 
98
+ ## Supported Datasets & Categories
99
+
100
+ ### VLSP2016 Dataset - General Sentiment Analysis (3 classes)
101
+
102
+ **Sentiment Categories:**
103
+ - **positive** - Positive sentiment towards products/services
104
+ - **negative** - Negative sentiment towards products/services
105
+ - **neutral** - Neutral or mixed sentiment
106
+
107
+ **Dataset Statistics:**
108
+ - Training samples: 5,100 (1,700 per class)
109
+ - Test samples: 1,050 (350 per class)
110
+ - Balanced distribution across all sentiment classes
111
+ - Domain: General product and service reviews
112
 
113
  ### UTS2017_Bank Dataset - Banking Aspect Sentiment (35 combined classes)
114
 
 
148
 
149
  ### Training the Model
150
 
151
+ #### Dataset Selection and Training
152
+
153
+ **VLSP2016 Dataset (General Sentiment Analysis):**
154
  ```bash
155
+ # Train on VLSP2016 with Logistic Regression
156
+ python train.py --dataset vlsp2016 --model logistic
157
 
158
  # Train with SVC for better performance
159
+ python train.py --dataset vlsp2016 --model svc_linear
160
+
161
+ # Compare n-gram ranges
162
+ python train.py --dataset vlsp2016 --model svc_linear --ngram-min 1 --ngram-max 2
163
+ python train.py --dataset vlsp2016 --model svc_linear --ngram-min 1 --ngram-max 3
164
+
165
+ # Export model for deployment
166
+ python train.py --dataset vlsp2016 --model svc_linear --export-model
167
+ ```
168
+
169
+ **UTS2017_Bank Dataset (Banking Aspect Sentiment Analysis):**
170
+ ```bash
171
+ # Train on UTS2017_Bank (default dataset)
172
+ python train.py --dataset uts2017 --model logistic
173
+
174
+ # Train with SVC for better performance
175
+ python train.py --dataset uts2017 --model svc_linear
176
 
177
  # With specific parameters
178
+ python train.py --dataset uts2017 --model logistic --max-features 20000 --ngram-min 1 --ngram-max 2
179
 
180
  # Export model for deployment
181
+ python train.py --dataset uts2017 --model logistic --export-model
182
 
183
+ # Compare multiple models on specific dataset
184
+ python train.py --dataset vlsp2016 --compare-models logistic svc_linear
185
  ```
186
 
187
  ### Training from Scratch
 
189
  ```python
190
  from train import train_notebook
191
 
192
+ # Train VLSP2016 general sentiment model
193
+ results = train_notebook(
194
+ dataset="vlsp2016",
195
+ model_name="svc_linear",
196
+ max_features=20000,
197
+ ngram_min=1,
198
+ ngram_max=2,
199
+ export_model=True
200
+ )
201
+
202
  # Train UTS2017_Bank aspect sentiment model
203
  results = train_notebook(
204
+ dataset="uts2017",
205
  model_name="logistic",
206
  max_features=20000,
207
  ngram_min=1,
 
209
  export_model=True
210
  )
211
 
212
+ # Compare multiple models on VLSP2016
213
  comparison_results = train_notebook(
214
+ dataset="vlsp2016",
215
  compare=True
216
  )
217
  ```
218
 
219
  ## Performance Metrics
220
 
221
+ ### VLSP2016 General Sentiment Analysis Performance
222
+ - **Training Accuracy**: 94.57% (SVC Linear)
223
+ - **Test Accuracy**: 71.14% (SVC Linear, 1-2 ngram) / 70.67% (SVC Linear, 1-3 ngram) / 70.19% (Logistic Regression)
224
+ - **Training Samples**: 5,100 (balanced: 1,700 per class)
225
+ - **Test Samples**: 1,050 (balanced: 350 per class)
226
+ - **Number of Classes**: 3 sentiment polarities
227
+ - **Training Time**: ~24.95 seconds (SVC) / 0.75 seconds (LR)
228
+ - **Per-Class Performance (SVC Linear)**:
229
+ - **Positive**: 80% precision, 72% recall, 76% F1-score
230
+ - **Negative**: 70% precision, 72% recall, 71% F1-score
231
+ - **Neutral**: 65% precision, 69% recall, 67% F1-score
232
+ - **Key Insights**: Consistent performance across all sentiment classes due to balanced dataset
233
+ - **Optimal N-gram**: Bigrams (1-2) outperform trigrams (1-3) by 0.47 percentage points
234
+
235
  ### UTS2017_Bank Aspect Sentiment Analysis Performance
236
+ - **Training Accuracy**: 94.57% (SVC)
237
+ - **Test Accuracy**: 71.72% (SVC) / 68.18% (Logistic Regression)
238
  - **Training Samples**: 1,581
239
  - **Test Samples**: 396
240
  - **Number of Classes**: 35 aspect-sentiment combinations
241
+ - **Training Time**: ~5.3 seconds (SVC) / 2.13 seconds (LR)
242
  - **Best Performing Classes**:
243
  - `TRADEMARK#positive`: 90% F1-score
244
  - `CUSTOMER_SUPPORT#positive`: 88% F1-score
245
+ - `LOAN#negative`: 67% F1-score (SVC improvement over LR)
246
  - `CUSTOMER_SUPPORT#negative`: 65% F1-score
247
  - **Challenges**: Class imbalance affects minority aspect-sentiment combinations
248
+ - **Key Finding**: SVC shows superior category diversity compared to Logistic Regression
249
+
250
+ ### Cross-Dataset Performance Analysis
251
+ - **Consistent SVC Performance**: ~71% accuracy on both 3-class (VLSP2016) and 35-class (UTS2017_Bank) tasks
252
+ - **Balance Impact**: Balanced datasets (VLSP2016) yield consistent per-class results while imbalanced datasets create performance variations
253
+ - **Training Efficiency**: Larger balanced datasets require more training time but provide stable results
254
 
255
  ## Using the Pre-trained Models
256
 
 
259
  ```python
260
  import joblib
261
 
262
+ # Load VLSP2016 general sentiment model
263
+ general_model = joblib.load("vlsp2016_sentiment_20250929_075529.joblib")
264
+
265
+ # Load UTS2017_Bank aspect sentiment model
266
+ banking_model = joblib.load("uts2017_sentiment_20250928_131716.joblib")
267
 
268
  # Or use inference script directly
269
  from inference import predict_text
270
 
271
+ # General sentiment analysis
272
+ general_text = "Sản phẩm này rất tốt, tôi rất hài lòng"
273
+ prediction, confidence, top_predictions = predict_text(general_model, general_text)
274
+ print(f"General Sentiment: {prediction}") # Expected: positive
275
+
276
+ # Banking aspect sentiment analysis
277
+ bank_text = "Lãi suất vay mua nhà hiện tại quá cao"
278
+ prediction, confidence, top_predictions = predict_text(banking_model, bank_text)
279
+ print(f"Banking Aspect-Sentiment: {prediction}") # Expected: INTEREST_RATE#negative
280
 
 
281
  print(f"Confidence: {confidence:.3f}")
282
  print("Top 3 predictions:")
283
  for i, (category, prob) in enumerate(top_predictions, 1):
284
  print(f" {i}. {category}: {prob:.3f}")
285
 
286
+ # Example output for banking text:
287
+ # Banking Aspect-Sentiment: INTEREST_RATE#negative
288
+ # Confidence: 0.509
289
  # Top 3 predictions:
290
+ # 1. INTEREST_RATE#negative: 0.509
291
+ # 2. LOAN#negative: 0.218
292
+ # 3. CUSTOMER_SUPPORT#negative: 0.095
293
  ```
294
 
295
  ### Using the Inference Script
 
311
 
312
  ## Model Parameters
313
 
314
+ - `dataset`: Dataset selection ("vlsp2016" for general sentiment, "uts2017" for banking aspect sentiment)
315
  - `model`: Model type ("logistic", "svc_linear", "svc_rbf", "naive_bayes", "decision_tree", "random_forest", etc.)
316
  - `max_features`: Maximum number of TF-IDF features (default: 20000)
317
+ - `ngram_min/max`: N-gram range (default: 1-2, optimal for Vietnamese)
318
+ - `split_ratio`: Train/test split ratio (default: 0.2, only used for uts2017)
319
  - `n_samples`: Optional sample limit for quick testing
320
+ - `export_model`: Export model for deployment (creates `<dataset>_sentiment_<timestamp>.joblib`)
321
+ - `compare`: Compare multiple model configurations
322
+ - `compare_models`: Specify models to compare
323
+
324
+ ## Project Management
325
+
326
+ ### Cleanup Utility
327
+
328
+ The project includes a cleanup script to manage training runs:
329
+
330
+ ```bash
331
+ # Preview runs that will be deleted (without exported models)
332
+ uv run python clean.py --dry-run --verbose
333
+
334
+ # Clean up runs without exported models
335
+ uv run python clean.py --yes
336
+
337
+ # Interactive cleanup with confirmation
338
+ uv run python clean.py
339
+ ```
340
+
341
+ **Features:**
342
+ - Automatically identifies runs without exported model files
343
+ - Shows space that will be freed
344
+ - Dry-run mode for safe previewing
345
+ - Detailed information about each run
346
+ - Preserves runs with exported models
347
 
348
  ## Limitations
349
 
350
  1. **Language Specificity**: Only works with Vietnamese text
351
+ 2. **Domain Coverage**: Two specialized domains (general sentiment + banking aspect sentiment)
352
  3. **Feature Limitations**: Limited to 20,000 most frequent features
353
+ 4. **Class Imbalance Sensitivity**: Performance degrades significantly with imbalanced datasets (evident in UTS2017_Bank)
354
  5. **Specific Weaknesses**:
355
+ - **VLSP2016**: Minor performance variation between sentiment classes
356
+ - **UTS2017_Bank**: Poor performance on minority aspect-sentiment classes due to insufficient training data
357
+ - **N-gram Limitation**: Trigrams provide minimal improvement over bigrams while increasing computational cost
358
+ - Banking domain aspects limited to predefined categories (account, loan, card, etc.)
359
 
360
  ## Ethical Considerations
361
 
362
+ - **Dataset Bias**: Models reflect biases present in training datasets (VLSP2016 general reviews, UTS2017_Bank banking feedback)
363
+ - **Performance Variation**: Significant performance differences between balanced (VLSP2016) and imbalanced (UTS2017_Bank) datasets
364
+ - **Domain Validation**: Should be validated on target domain before deployment
365
+ - **Class Imbalance**: Consider dataset balance when interpreting results, especially for banking aspect sentiment
366
+ - **Representation**: VLSP2016 provides more equitable performance across sentiment classes due to balanced training data
367
 
368
  ## Citation
369
 
 
371
 
372
  ```bibtex
373
  @misc{undertheseanlp_2025,
374
+ author = { Vu Anh },
375
+ organization = { UnderTheSea NLP },
376
+ title = { Pulse Core 1 - Vietnamese Sentiment Analysis System },
377
  year = 2025,
378
  url = { https://huggingface.co/undertheseanlp/pulse_core_1 },
379
  doi = { 10.57967/hf/6605 },