Trelis
/

transcribe-en_gb-spelling-v1-tiny

@@ -52,7 +52,7 @@ Test datasets:
 - `Trelis/transcribe-to-en_GB-v1` or `Trelis/transcribe-to-en_US-v1` - 30 rows of synthetic english voice data, evenly split across GB and US english source text, GB and US accents, male and female accents, mixed speeds, and with transcripts converted to British English spelling.
 - `openslr/librispeech_asr` - 50 rows from the `test.other` split which contains mixed English language samples with high WER.
-### English Variant Transcription Performance
 While original Whisper models transcribe ~6% of this test set to American English, the fine-tuned model reduces that towards 1% (and <0.2% for the turbo model).
 **Dataset:** `Trelis/transcribe-to-en_GB-v1`
@@ -67,6 +67,21 @@ While original Whisper models transcribe ~6% of this test set to American Englis
 | 2025-12-02 12:28:33 | `openai/whisper-large-v3-turbo` | 7.15% | 30/30/0 | 5.27% | 1.62% | Yes | mps |
 | 2025-12-02 13:11:01 | `Trelis/transcribe-en_gb-spelling-v1-turbo` | 1.18% | 30/30/0 | 0.20% | 6.70% | Yes | mps |
 ### LibriSpeech Performance
 LibriSpeech is used here as an independent check on the extent of degradation caused by fine-tuning. Notice how smaller models tend to degrade more when fine-tuned. Note that there is no evidence of degradation on the turbo model:
@@ -79,8 +94,10 @@ LibriSpeech is used here as an independent check on the extent of degradation ca
 |-----------|-------|-------|------------------------------|---------|---------|------------|--------|
 | 2025-12-02 09:27:52 | `openai/whisper-tiny` | 11.62% | 50/50/0 | 0.00% | 0.00% | Yes | mps |
 | 2025-12-02 12:17:18 | `Trelis/transcribe-en_gb-spelling-v1-tiny` | 13.18% | 50/50/0 | 0.00% | 0.00% | Yes | mps |
 | 2025-11-27 13:23:00 | `openai/whisper-large-v3-turbo` | 4.47% | 50/50/0 | 0.00% | 0.00% | Yes | mps |
 | 2025-12-02 13:24:33 | `Trelis/transcribe-en_gb-spelling-v1-turbo` | 4.02% | 50/50/0 | 0.00% | 0.00% | Yes | mps |
 ## Inference
 ### Quick Demo (3 samples)
@@ -92,7 +109,7 @@ from datasets import load_dataset
 from transformers import pipeline
 DATASET_ID = "Trelis/transcribe-to-en_GB-v1"
-MODEL_ID = "Trelis/transcribe-british-spelling-v1-tiny"
 print(f"Loading dataset: {DATASET_ID} (first 3 rows)")
 dataset = load_dataset(DATASET_ID, split="test[:3]")

 - `Trelis/transcribe-to-en_GB-v1` or `Trelis/transcribe-to-en_US-v1` - 30 rows of synthetic english voice data, evenly split across GB and US english source text, GB and US accents, male and female accents, mixed speeds, and with transcripts converted to British English spelling.
 - `openslr/librispeech_asr` - 50 rows from the `test.other` split which contains mixed English language samples with high WER.
+### British (EN_UK) Variant Transcription Performance
 While original Whisper models transcribe ~6% of this test set to American English, the fine-tuned model reduces that towards 1% (and <0.2% for the turbo model).
 **Dataset:** `Trelis/transcribe-to-en_GB-v1`
 | 2025-12-02 12:28:33 | `openai/whisper-large-v3-turbo` | 7.15% | 30/30/0 | 5.27% | 1.62% | Yes | mps |
 | 2025-12-02 13:11:01 | `Trelis/transcribe-en_gb-spelling-v1-turbo` | 1.18% | 30/30/0 | 0.20% | 6.70% | Yes | mps |
+### American (EN_US) Transcription Performance
+Original Whisper models already tend to transcribe to American English, and so the improvement in transcription performance is smaller on the fine-tuned model, although improving by ~1.5% on the Turbo model.
+**Dataset:** `Trelis/asr-en_mixed-to-en_US-tts-test-20251202-105023`
+**Config:** `N/A`
+**Split:** `test`
+**Text Column:** `text`
+| Timestamp | Model | WER % | Samples (Eval/Total/Skipped) | US→GB % | GB→US % | Normalized | Device |
+|-----------|-------|-------|------------------------------|---------|---------|------------|--------|
+| 2025-12-02 11:03:11 | `openai/whisper-tiny` | 4.93% | 30/30/0 | 6.32% | 0.54% | Yes | mps |
+| 2025-12-02 13:43:28 | `Trelis/transcribe-en_us-spelling-v1-tiny` | 3.89% | 30/30/0 | 6.38% | 0.27% | Yes | mps |
+| 2025-12-02 11:02:45 | `openai/whisper-large-v3-turbo` | 4.03% | 30/30/0 | 5.47% | 1.62% | Yes | mps |
+| 2025-12-02 14:24:32 | `Trelis/transcribe-en_us-spelling-v1-turbo` | 1.25% | 30/30/0 | 6.84% | 0.07% | Yes | mps |
 ### LibriSpeech Performance
 LibriSpeech is used here as an independent check on the extent of degradation caused by fine-tuning. Notice how smaller models tend to degrade more when fine-tuned. Note that there is no evidence of degradation on the turbo model:
 |-----------|-------|-------|------------------------------|---------|---------|------------|--------|
 | 2025-12-02 09:27:52 | `openai/whisper-tiny` | 11.62% | 50/50/0 | 0.00% | 0.00% | Yes | mps |
 | 2025-12-02 12:17:18 | `Trelis/transcribe-en_gb-spelling-v1-tiny` | 13.18% | 50/50/0 | 0.00% | 0.00% | Yes | mps |
+| 2025-12-02 13:44:04 | `Trelis/transcribe-en_us-spelling-v1-tiny` | 12.40% | 50/50/0 | 0.00% | 0.00% | Yes | mps |
 | 2025-11-27 13:23:00 | `openai/whisper-large-v3-turbo` | 4.47% | 50/50/0 | 0.00% | 0.00% | Yes | mps |
 | 2025-12-02 13:24:33 | `Trelis/transcribe-en_gb-spelling-v1-turbo` | 4.02% | 50/50/0 | 0.00% | 0.00% | Yes | mps |
+| 2025-12-02 14:37:54 | `Trelis/transcribe-en_us-spelling-v1-turbo` | 4.13% | 50/50/0 | 0.00% | 0.00% | Yes | mps |
 ## Inference
 ### Quick Demo (3 samples)
 from transformers import pipeline
 DATASET_ID = "Trelis/transcribe-to-en_GB-v1"
+MODEL_ID = "transcribe-en_gb-spelling-v1-tiny
 print(f"Loading dataset: {DATASET_ID} (first 3 rows)")
 dataset = load_dataset(DATASET_ID, split="test[:3]")