Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -52,7 +52,7 @@ Test datasets:
|
|
| 52 |
- `Trelis/transcribe-to-en_GB-v1` or `Trelis/transcribe-to-en_US-v1` - 30 rows of synthetic english voice data, evenly split across GB and US english source text, GB and US accents, male and female accents, mixed speeds, and with transcripts converted to British English spelling.
|
| 53 |
- `openslr/librispeech_asr` - 50 rows from the `test.other` split which contains mixed English language samples with high WER.
|
| 54 |
|
| 55 |
-
###
|
| 56 |
While original Whisper models transcribe ~6% of this test set to American English, the fine-tuned model reduces that towards 1% (and <0.2% for the turbo model).
|
| 57 |
|
| 58 |
**Dataset:** `Trelis/transcribe-to-en_GB-v1`
|
|
@@ -67,6 +67,21 @@ While original Whisper models transcribe ~6% of this test set to American Englis
|
|
| 67 |
| 2025-12-02 12:28:33 | `openai/whisper-large-v3-turbo` | 7.15% | 30/30/0 | 5.27% | 1.62% | Yes | mps |
|
| 68 |
| 2025-12-02 13:11:01 | `Trelis/transcribe-en_gb-spelling-v1-turbo` | 1.18% | 30/30/0 | 0.20% | 6.70% | Yes | mps |
|
| 69 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 70 |
### LibriSpeech Performance
|
| 71 |
LibriSpeech is used here as an independent check on the extent of degradation caused by fine-tuning. Notice how smaller models tend to degrade more when fine-tuned. Note that there is no evidence of degradation on the turbo model:
|
| 72 |
|
|
@@ -79,8 +94,10 @@ LibriSpeech is used here as an independent check on the extent of degradation ca
|
|
| 79 |
|-----------|-------|-------|------------------------------|---------|---------|------------|--------|
|
| 80 |
| 2025-12-02 09:27:52 | `openai/whisper-tiny` | 11.62% | 50/50/0 | 0.00% | 0.00% | Yes | mps |
|
| 81 |
| 2025-12-02 12:17:18 | `Trelis/transcribe-en_gb-spelling-v1-tiny` | 13.18% | 50/50/0 | 0.00% | 0.00% | Yes | mps |
|
|
|
|
| 82 |
| 2025-11-27 13:23:00 | `openai/whisper-large-v3-turbo` | 4.47% | 50/50/0 | 0.00% | 0.00% | Yes | mps |
|
| 83 |
| 2025-12-02 13:24:33 | `Trelis/transcribe-en_gb-spelling-v1-turbo` | 4.02% | 50/50/0 | 0.00% | 0.00% | Yes | mps |
|
|
|
|
| 84 |
|
| 85 |
## Inference
|
| 86 |
### Quick Demo (3 samples)
|
|
@@ -92,7 +109,7 @@ from datasets import load_dataset
|
|
| 92 |
from transformers import pipeline
|
| 93 |
|
| 94 |
DATASET_ID = "Trelis/transcribe-to-en_GB-v1"
|
| 95 |
-
MODEL_ID = "
|
| 96 |
|
| 97 |
print(f"Loading dataset: {DATASET_ID} (first 3 rows)")
|
| 98 |
dataset = load_dataset(DATASET_ID, split="test[:3]")
|
|
|
|
| 52 |
- `Trelis/transcribe-to-en_GB-v1` or `Trelis/transcribe-to-en_US-v1` - 30 rows of synthetic english voice data, evenly split across GB and US english source text, GB and US accents, male and female accents, mixed speeds, and with transcripts converted to British English spelling.
|
| 53 |
- `openslr/librispeech_asr` - 50 rows from the `test.other` split which contains mixed English language samples with high WER.
|
| 54 |
|
| 55 |
+
### British (EN_UK) Variant Transcription Performance
|
| 56 |
While original Whisper models transcribe ~6% of this test set to American English, the fine-tuned model reduces that towards 1% (and <0.2% for the turbo model).
|
| 57 |
|
| 58 |
**Dataset:** `Trelis/transcribe-to-en_GB-v1`
|
|
|
|
| 67 |
| 2025-12-02 12:28:33 | `openai/whisper-large-v3-turbo` | 7.15% | 30/30/0 | 5.27% | 1.62% | Yes | mps |
|
| 68 |
| 2025-12-02 13:11:01 | `Trelis/transcribe-en_gb-spelling-v1-turbo` | 1.18% | 30/30/0 | 0.20% | 6.70% | Yes | mps |
|
| 69 |
|
| 70 |
+
### American (EN_US) Transcription Performance
|
| 71 |
+
Original Whisper models already tend to transcribe to American English, and so the improvement in transcription performance is smaller on the fine-tuned model, although improving by ~1.5% on the Turbo model.
|
| 72 |
+
|
| 73 |
+
**Dataset:** `Trelis/asr-en_mixed-to-en_US-tts-test-20251202-105023`
|
| 74 |
+
**Config:** `N/A`
|
| 75 |
+
**Split:** `test`
|
| 76 |
+
**Text Column:** `text`
|
| 77 |
+
|
| 78 |
+
| Timestamp | Model | WER % | Samples (Eval/Total/Skipped) | US→GB % | GB→US % | Normalized | Device |
|
| 79 |
+
|-----------|-------|-------|------------------------------|---------|---------|------------|--------|
|
| 80 |
+
| 2025-12-02 11:03:11 | `openai/whisper-tiny` | 4.93% | 30/30/0 | 6.32% | 0.54% | Yes | mps |
|
| 81 |
+
| 2025-12-02 13:43:28 | `Trelis/transcribe-en_us-spelling-v1-tiny` | 3.89% | 30/30/0 | 6.38% | 0.27% | Yes | mps |
|
| 82 |
+
| 2025-12-02 11:02:45 | `openai/whisper-large-v3-turbo` | 4.03% | 30/30/0 | 5.47% | 1.62% | Yes | mps |
|
| 83 |
+
| 2025-12-02 14:24:32 | `Trelis/transcribe-en_us-spelling-v1-turbo` | 1.25% | 30/30/0 | 6.84% | 0.07% | Yes | mps |
|
| 84 |
+
|
| 85 |
### LibriSpeech Performance
|
| 86 |
LibriSpeech is used here as an independent check on the extent of degradation caused by fine-tuning. Notice how smaller models tend to degrade more when fine-tuned. Note that there is no evidence of degradation on the turbo model:
|
| 87 |
|
|
|
|
| 94 |
|-----------|-------|-------|------------------------------|---------|---------|------------|--------|
|
| 95 |
| 2025-12-02 09:27:52 | `openai/whisper-tiny` | 11.62% | 50/50/0 | 0.00% | 0.00% | Yes | mps |
|
| 96 |
| 2025-12-02 12:17:18 | `Trelis/transcribe-en_gb-spelling-v1-tiny` | 13.18% | 50/50/0 | 0.00% | 0.00% | Yes | mps |
|
| 97 |
+
| 2025-12-02 13:44:04 | `Trelis/transcribe-en_us-spelling-v1-tiny` | 12.40% | 50/50/0 | 0.00% | 0.00% | Yes | mps |
|
| 98 |
| 2025-11-27 13:23:00 | `openai/whisper-large-v3-turbo` | 4.47% | 50/50/0 | 0.00% | 0.00% | Yes | mps |
|
| 99 |
| 2025-12-02 13:24:33 | `Trelis/transcribe-en_gb-spelling-v1-turbo` | 4.02% | 50/50/0 | 0.00% | 0.00% | Yes | mps |
|
| 100 |
+
| 2025-12-02 14:37:54 | `Trelis/transcribe-en_us-spelling-v1-turbo` | 4.13% | 50/50/0 | 0.00% | 0.00% | Yes | mps |
|
| 101 |
|
| 102 |
## Inference
|
| 103 |
### Quick Demo (3 samples)
|
|
|
|
| 109 |
from transformers import pipeline
|
| 110 |
|
| 111 |
DATASET_ID = "Trelis/transcribe-to-en_GB-v1"
|
| 112 |
+
MODEL_ID = "transcribe-en_gb-spelling-v1-tiny
|
| 113 |
|
| 114 |
print(f"Loading dataset: {DATASET_ID} (first 3 rows)")
|
| 115 |
dataset = load_dataset(DATASET_ID, split="test[:3]")
|