RonanMcGovern commited on
Commit
5eff607
·
verified ·
1 Parent(s): 13ba401

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +19 -2
README.md CHANGED
@@ -52,7 +52,7 @@ Test datasets:
52
  - `Trelis/transcribe-to-en_GB-v1` or `Trelis/transcribe-to-en_US-v1` - 30 rows of synthetic english voice data, evenly split across GB and US english source text, GB and US accents, male and female accents, mixed speeds, and with transcripts converted to British English spelling.
53
  - `openslr/librispeech_asr` - 50 rows from the `test.other` split which contains mixed English language samples with high WER.
54
 
55
- ### English Variant Transcription Performance
56
  While original Whisper models transcribe ~6% of this test set to American English, the fine-tuned model reduces that towards 1% (and <0.2% for the turbo model).
57
 
58
  **Dataset:** `Trelis/transcribe-to-en_GB-v1`
@@ -67,6 +67,21 @@ While original Whisper models transcribe ~6% of this test set to American Englis
67
  | 2025-12-02 12:28:33 | `openai/whisper-large-v3-turbo` | 7.15% | 30/30/0 | 5.27% | 1.62% | Yes | mps |
68
  | 2025-12-02 13:11:01 | `Trelis/transcribe-en_gb-spelling-v1-turbo` | 1.18% | 30/30/0 | 0.20% | 6.70% | Yes | mps |
69
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
70
  ### LibriSpeech Performance
71
  LibriSpeech is used here as an independent check on the extent of degradation caused by fine-tuning. Notice how smaller models tend to degrade more when fine-tuned. Note that there is no evidence of degradation on the turbo model:
72
 
@@ -79,8 +94,10 @@ LibriSpeech is used here as an independent check on the extent of degradation ca
79
  |-----------|-------|-------|------------------------------|---------|---------|------------|--------|
80
  | 2025-12-02 09:27:52 | `openai/whisper-tiny` | 11.62% | 50/50/0 | 0.00% | 0.00% | Yes | mps |
81
  | 2025-12-02 12:17:18 | `Trelis/transcribe-en_gb-spelling-v1-tiny` | 13.18% | 50/50/0 | 0.00% | 0.00% | Yes | mps |
 
82
  | 2025-11-27 13:23:00 | `openai/whisper-large-v3-turbo` | 4.47% | 50/50/0 | 0.00% | 0.00% | Yes | mps |
83
  | 2025-12-02 13:24:33 | `Trelis/transcribe-en_gb-spelling-v1-turbo` | 4.02% | 50/50/0 | 0.00% | 0.00% | Yes | mps |
 
84
 
85
  ## Inference
86
  ### Quick Demo (3 samples)
@@ -92,7 +109,7 @@ from datasets import load_dataset
92
  from transformers import pipeline
93
 
94
  DATASET_ID = "Trelis/transcribe-to-en_GB-v1"
95
- MODEL_ID = "Trelis/transcribe-british-spelling-v1-tiny"
96
 
97
  print(f"Loading dataset: {DATASET_ID} (first 3 rows)")
98
  dataset = load_dataset(DATASET_ID, split="test[:3]")
 
52
  - `Trelis/transcribe-to-en_GB-v1` or `Trelis/transcribe-to-en_US-v1` - 30 rows of synthetic english voice data, evenly split across GB and US english source text, GB and US accents, male and female accents, mixed speeds, and with transcripts converted to British English spelling.
53
  - `openslr/librispeech_asr` - 50 rows from the `test.other` split which contains mixed English language samples with high WER.
54
 
55
+ ### British (EN_UK) Variant Transcription Performance
56
  While original Whisper models transcribe ~6% of this test set to American English, the fine-tuned model reduces that towards 1% (and <0.2% for the turbo model).
57
 
58
  **Dataset:** `Trelis/transcribe-to-en_GB-v1`
 
67
  | 2025-12-02 12:28:33 | `openai/whisper-large-v3-turbo` | 7.15% | 30/30/0 | 5.27% | 1.62% | Yes | mps |
68
  | 2025-12-02 13:11:01 | `Trelis/transcribe-en_gb-spelling-v1-turbo` | 1.18% | 30/30/0 | 0.20% | 6.70% | Yes | mps |
69
 
70
+ ### American (EN_US) Transcription Performance
71
+ Original Whisper models already tend to transcribe to American English, and so the improvement in transcription performance is smaller on the fine-tuned model, although improving by ~1.5% on the Turbo model.
72
+
73
+ **Dataset:** `Trelis/asr-en_mixed-to-en_US-tts-test-20251202-105023`
74
+ **Config:** `N/A`
75
+ **Split:** `test`
76
+ **Text Column:** `text`
77
+
78
+ | Timestamp | Model | WER % | Samples (Eval/Total/Skipped) | US→GB % | GB→US % | Normalized | Device |
79
+ |-----------|-------|-------|------------------------------|---------|---------|------------|--------|
80
+ | 2025-12-02 11:03:11 | `openai/whisper-tiny` | 4.93% | 30/30/0 | 6.32% | 0.54% | Yes | mps |
81
+ | 2025-12-02 13:43:28 | `Trelis/transcribe-en_us-spelling-v1-tiny` | 3.89% | 30/30/0 | 6.38% | 0.27% | Yes | mps |
82
+ | 2025-12-02 11:02:45 | `openai/whisper-large-v3-turbo` | 4.03% | 30/30/0 | 5.47% | 1.62% | Yes | mps |
83
+ | 2025-12-02 14:24:32 | `Trelis/transcribe-en_us-spelling-v1-turbo` | 1.25% | 30/30/0 | 6.84% | 0.07% | Yes | mps |
84
+
85
  ### LibriSpeech Performance
86
  LibriSpeech is used here as an independent check on the extent of degradation caused by fine-tuning. Notice how smaller models tend to degrade more when fine-tuned. Note that there is no evidence of degradation on the turbo model:
87
 
 
94
  |-----------|-------|-------|------------------------------|---------|---------|------------|--------|
95
  | 2025-12-02 09:27:52 | `openai/whisper-tiny` | 11.62% | 50/50/0 | 0.00% | 0.00% | Yes | mps |
96
  | 2025-12-02 12:17:18 | `Trelis/transcribe-en_gb-spelling-v1-tiny` | 13.18% | 50/50/0 | 0.00% | 0.00% | Yes | mps |
97
+ | 2025-12-02 13:44:04 | `Trelis/transcribe-en_us-spelling-v1-tiny` | 12.40% | 50/50/0 | 0.00% | 0.00% | Yes | mps |
98
  | 2025-11-27 13:23:00 | `openai/whisper-large-v3-turbo` | 4.47% | 50/50/0 | 0.00% | 0.00% | Yes | mps |
99
  | 2025-12-02 13:24:33 | `Trelis/transcribe-en_gb-spelling-v1-turbo` | 4.02% | 50/50/0 | 0.00% | 0.00% | Yes | mps |
100
+ | 2025-12-02 14:37:54 | `Trelis/transcribe-en_us-spelling-v1-turbo` | 4.13% | 50/50/0 | 0.00% | 0.00% | Yes | mps |
101
 
102
  ## Inference
103
  ### Quick Demo (3 samples)
 
109
  from transformers import pipeline
110
 
111
  DATASET_ID = "Trelis/transcribe-to-en_GB-v1"
112
+ MODEL_ID = "transcribe-en_gb-spelling-v1-tiny
113
 
114
  print(f"Loading dataset: {DATASET_ID} (first 3 rows)")
115
  dataset = load_dataset(DATASET_ID, split="test[:3]")