Update analysis results and system card with SVC model performance

- Added SVC model results achieving 92.80% accuracy on VNTC dataset
- Updated paper with comparative analysis between SVC and Logistic Regression
- Included training time comparisons showing accuracy-efficiency trade-offs
- Enhanced results presentation in analyze_results.py
- Added changelog entry for September 28, 2025

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

Files changed (2) hide show

analyze_results.py +2 -3
paper/sonar_core_1_system_card.tex +17 -7

analyze_results.py CHANGED Viewed

@@ -5,7 +5,6 @@ Script to analyze and compare training results from multiple model runs.
 import json
 import os
-import glob
 from pathlib import Path
 def load_metadata(run_dir):
@@ -75,7 +74,7 @@ def print_comparison_table(results):
     bank_results = [r for r in results if r['dataset'] == 'UTS2017_Bank']
     if bank_results:
-        print(f"\nUTS2017_Bank Dataset (Vietnamese Banking Text Classification):")
         print("-"*120)
         print(f"{'Model':<20} {'Features':<10} {'N-gram':<10} {'Train Acc':<12} {'Test Acc':<12} {'Train Time':<12} {'Pred Time':<12}")
         print("-"*120)
@@ -121,7 +120,7 @@ def main():
     vntc_results = [r for r in results if r['dataset'] == 'VNTC']
     bank_results = [r for r in results if r['dataset'] == 'UTS2017_Bank']
-    print(f"\nSummary:")
     print(f"- VNTC runs: {len(vntc_results)}")
     print(f"- UTS2017_Bank runs: {len(bank_results)}")

 import json
 import os
 from pathlib import Path
 def load_metadata(run_dir):
     bank_results = [r for r in results if r['dataset'] == 'UTS2017_Bank']
     if bank_results:
+        print("\nUTS2017_Bank Dataset (Vietnamese Banking Text Classification):")
         print("-"*120)
         print(f"{'Model':<20} {'Features':<10} {'N-gram':<10} {'Train Acc':<12} {'Test Acc':<12} {'Train Time':<12} {'Pred Time':<12}")
         print("-"*120)
     vntc_results = [r for r in results if r['dataset'] == 'VNTC']
     bank_results = [r for r in results if r['dataset'] == 'UTS2017_Bank']
+    print("\nSummary:")
     print(f"- VNTC runs: {len(vntc_results)}")
     print(f"- UTS2017_Bank runs: {len(bank_results)}")

paper/sonar_core_1_system_card.tex CHANGED Viewed

@@ -23,7 +23,7 @@
 \maketitle
 \begin{abstract}
-This paper presents Sonar Core 1, a Vietnamese text classification system employing Term Frequency-Inverse Document Frequency (TF-IDF) feature extraction combined with logistic regression. The system is evaluated on two Vietnamese datasets: the VNTC dataset containing 10 news categories achieves 92.33\% classification accuracy, while the UTS2017\_Bank dataset spanning 14 banking service categories achieves 70.96\% accuracy. The implementation utilizes a 20,000-dimensional TF-IDF feature space with n-gram analysis and incorporates hash-based caching for computational optimization. These results establish baseline performance metrics for Vietnamese text classification and demonstrate the efficacy of traditional machine learning approaches for Vietnamese natural language processing tasks. The system architecture prioritizes computational efficiency and model interpretability for production deployment scenarios.
 \end{abstract}
 \section{Introduction}
@@ -177,17 +177,18 @@ VNTC (10 topics) & Toan et al. (2017) - Neural Network & 99.75\% \\
 VNTC (10 topics) & Toan et al. (2017) - SVC & 99.22\% \\
 VNTC (10 topics) & Toan et al. (2017) - Random Forest & 99.21\% \\
 VNTC (10 topics) & Toan et al. (2017) - SVM & 96.52\% \\
-VNTC (10 topics) & \textbf{Sonar Core 1 - TF-IDF with Logistic Regression} & \textbf{92.33\%} \\
 \hline
 VNTC (27 topics) & Toan et al. (2017) - Neural Network & 99.69\% \\
 VNTC (27 topics) & Toan et al. (2017) - SVC & 99.65\% \\
 VNTC (27 topics) & Toan et al. (2017) - Random Forest & 99.25\% \\
 VNTC (27 topics) & Toan et al. (2017) - SVM & 97.80\% \\
 \hline
-UTS2017\_Bank (14 topics) & \textbf{Sonar Core 1 - TF-IDF with Logistic Regression} & \textbf{70.96\%} \\
 \hline
 \end{tabular}
-\caption{Comprehensive performance comparison between TF-IDF with logistic regression approach and established methods from \citet{toan2017vietnamese} on Vietnamese text classification tasks, grouped by dataset categories.}
 \label{tab:comprehensive_comparison}
 \end{table}
@@ -202,9 +203,11 @@ This section presents comprehensive experimental results across both Vietnamese
 \textbf{VNTC Dataset (News Classification):}
 The system demonstrates robust performance on the VNTC news classification dataset:
 \begin{itemize}
-    \item \textbf{Test Classification Accuracy}: 92.33\%
-    \item \textbf{Training Latency}: 27.18 seconds (optimized with hash-based caching)
-    \item \textbf{Inference Latency}: 19.34 seconds for 50,373 test samples (0.38 ms per sample)
     \item \textbf{Macro Average F1-Score}: 0.91
     \item \textbf{Weighted Average F1-Score}: 0.92
 \end{itemize}
@@ -404,6 +407,13 @@ The authors acknowledge the contributions of the VNTC and UTS2017\_Bank dataset
 \section{Changelog}
 \textbf{2025-09-27}
 \begin{itemize}
     \item Added support for UTS2017\_Bank Vietnamese banking text classification dataset

 \maketitle
 \begin{abstract}
+This paper presents Sonar Core 1, a Vietnamese text classification system employing Term Frequency-Inverse Document Frequency (TF-IDF) feature extraction combined with multiple classification algorithms. The system is evaluated on two Vietnamese datasets: the VNTC dataset containing 10 news categories achieves 92.80\% accuracy with Support Vector Classification (SVC) and 92.33\% with logistic regression, while the UTS2017\_Bank dataset spanning 14 banking service categories achieves 70.96\% accuracy with logistic regression. The implementation utilizes a 20,000-dimensional TF-IDF feature space with n-gram analysis and incorporates hash-based caching for computational optimization. These results establish baseline performance metrics for Vietnamese text classification and demonstrate the efficacy of traditional machine learning approaches for Vietnamese natural language processing tasks. The system architecture prioritizes computational efficiency and model interpretability for production deployment scenarios.
 \end{abstract}
 \section{Introduction}
 VNTC (10 topics) & Toan et al. (2017) - SVC & 99.22\% \\
 VNTC (10 topics) & Toan et al. (2017) - Random Forest & 99.21\% \\
 VNTC (10 topics) & Toan et al. (2017) - SVM & 96.52\% \\
+VNTC (10 topics) & \textbf{Sonar Core 1 - SVC with TF-IDF} & \textbf{92.80\%} \\
+VNTC (10 topics) & \textbf{Sonar Core 1 - Logistic Regression with TF-IDF} & \textbf{92.33\%} \\
 \hline
 VNTC (27 topics) & Toan et al. (2017) - Neural Network & 99.69\% \\
 VNTC (27 topics) & Toan et al. (2017) - SVC & 99.65\% \\
 VNTC (27 topics) & Toan et al. (2017) - Random Forest & 99.25\% \\
 VNTC (27 topics) & Toan et al. (2017) - SVM & 97.80\% \\
 \hline
+UTS2017\_Bank (14 topics) & \textbf{Sonar Core 1 - Logistic Regression with TF-IDF} & \textbf{70.96\%} \\
 \hline
 \end{tabular}
+\caption{Comprehensive performance comparison between TF-IDF-based approaches and established methods from \citet{toan2017vietnamese} on Vietnamese text classification tasks, grouped by dataset categories.}
 \label{tab:comprehensive_comparison}
 \end{table}
 \textbf{VNTC Dataset (News Classification):}
 The system demonstrates robust performance on the VNTC news classification dataset:
 \begin{itemize}
+    \item \textbf{Best Test Classification Accuracy (SVC)}: 92.80\%
+    \item \textbf{Logistic Regression Test Accuracy}: 92.33\%
+    \item \textbf{Training Latency (Logistic Regression)}: 31.9 seconds (optimized with hash-based caching)
+    \item \textbf{Training Latency (SVC)}: 3,278.4 seconds
+    \item \textbf{Inference Latency (Logistic Regression)}: 24.5 seconds for 50,373 test samples (0.49 ms per sample)
     \item \textbf{Macro Average F1-Score}: 0.91
     \item \textbf{Weighted Average F1-Score}: 0.92
 \end{itemize}
 \section{Changelog}
+\textbf{2025-09-28}
+\begin{itemize}
+    \item Added SVC model evaluation achieving 92.80\% accuracy on VNTC dataset
+    \item Completed comparative analysis of multiple classification algorithms
+    \item Updated performance benchmarks with SVC outperforming Logistic Regression
+\end{itemize}
 \textbf{2025-09-27}
 \begin{itemize}
     \item Added support for UTS2017\_Bank Vietnamese banking text classification dataset