sneakyfree commited on
Commit
d3035b1
·
verified ·
1 Parent(s): 2ca9457

Refresh README — uniform WindyWord template with WER tier + dialect notes

Browse files
Files changed (1) hide show
  1. README.md +3 -22
README.md CHANGED
@@ -6,7 +6,6 @@ tags:
6
  - windyword
7
  - english
8
  - multilingual
9
- - multilingual-fallback
10
  library_name: transformers
11
  pipeline_tag: automatic-speech-recognition
12
  language:
@@ -16,32 +15,13 @@ language:
16
 
17
  # WindyWord.ai STT — Windy Pro Engine
18
 
19
- **The flagship multilingual speech-to-text engine. Transcribes audio in 99+ languages with state-of-the-art quality.**
20
-
21
- ## Recommended fallback for low-resource languages
22
-
23
- This is the **multilingual fallback model** for the WindyWord STT fleet. When a language-specific Lingua model is missing or underperforms (we explicitly flag these in the language-specific READMEs), production users should route through this model with the appropriate `language=` hint:
24
-
25
- ```python
26
- from transformers import WhisperForConditionalGeneration, WhisperProcessor
27
- processor = WhisperProcessor.from_pretrained("WindyWord/listen-windy-pro-engine", subfolder="safetensors")
28
- model = WhisperForConditionalGeneration.from_pretrained("WindyWord/listen-windy-pro-engine", subfolder="safetensors")
29
-
30
- # ig (Igbo), mn (Mongolian), or any thin-coverage language:
31
- ids = model.generate(input_features, language="ig", task="transcribe")
32
- ```
33
-
34
- Languages currently flagged for this fallback:
35
- - **Igbo (ig)** — community ASR thin; only available fine-tune is whisper-tiny which is 39M params.
36
- - **Mongolian (mn)** — both predecessor and upgrade attempts have audited at ~100% WER on FLEURS.
37
- - **Hebrew (he)**, **Malayalam (ml)** — current language-specific models are MARGINAL; whisper-large-v3 may give better real-world results.
38
 
39
  ## Profile
40
 
41
  - **Architecture:** 1.55B params · whisper-large-v3
42
  - **Profile:** premium / max accuracy
43
  - **Base model:** [openai/whisper-large-v3](https://huggingface.co/openai/whisper-large-v3)
44
- - **Multilingual:** 99 languages directly supported; auto-detects language by default
45
 
46
  ## Variants in this repo
47
 
@@ -63,6 +43,7 @@ model = WhisperForConditionalGeneration.from_pretrained("WindyWord/listen-windy-
63
  For CPU inference via CTranslate2:
64
  ```python
65
  import ctranslate2
 
66
  model = ctranslate2.models.Whisper("path/to/ct2-int8/")
67
  ```
68
 
@@ -74,6 +55,6 @@ Part of the [WindyWord.ai](https://windyword.ai) STT fleet. Visit windyword.ai f
74
 
75
  ## Provenance & License
76
 
77
- Weights derived from [openai/whisper-large-v3](https://huggingface.co/openai/whisper-large-v3) under Apache-2.0 (inherited). Proprietary fine-tuning by WindyWord.ai team via LoRA fog-of-mirror methodology where applicable.
78
 
79
  *Certified by Opus 4.6 Opus-Claw (Dr. C) on Veron-1 (RTX 5090, Mt Pleasant SC).*
 
6
  - windyword
7
  - english
8
  - multilingual
 
9
  library_name: transformers
10
  pipeline_tag: automatic-speech-recognition
11
  language:
 
15
 
16
  # WindyWord.ai STT — Windy Pro Engine
17
 
18
+ **Multilingual speech-to-text engine. Transcribes audio in 100+ languages, with English as the primary trained domain.**
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
19
 
20
  ## Profile
21
 
22
  - **Architecture:** 1.55B params · whisper-large-v3
23
  - **Profile:** premium / max accuracy
24
  - **Base model:** [openai/whisper-large-v3](https://huggingface.co/openai/whisper-large-v3)
 
25
 
26
  ## Variants in this repo
27
 
 
43
  For CPU inference via CTranslate2:
44
  ```python
45
  import ctranslate2
46
+ # After downloading the ct2-int8 subfolder:
47
  model = ctranslate2.models.Whisper("path/to/ct2-int8/")
48
  ```
49
 
 
55
 
56
  ## Provenance & License
57
 
58
+ Weights derived from [openai/whisper-large-v3](https://huggingface.co/openai/whisper-large-v3) under Apache-2.0 (inherited). Voice tiers are direct redistributions of the upstream community Whisper / distil-whisper variants; no LoRA fine-tuning has been applied to these voice models.
59
 
60
  *Certified by Opus 4.6 Opus-Claw (Dr. C) on Veron-1 (RTX 5090, Mt Pleasant SC).*