DewiBrynJones commited on
Commit
5753db4
·
verified ·
1 Parent(s): d3f656b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +21 -10
README.md CHANGED
@@ -1,17 +1,28 @@
1
  ---
2
  language:
3
- - cy
4
- - en
5
  license: apache-2.0
6
  library_name: transformers
7
  pipeline_tag: automatic-speech-recognition
8
  tags:
9
- - whisper
10
- - welsh
11
- - cymraeg
12
- - speech-recognition
13
- - translation
14
  base_model: openai/whisper-large-v2
 
 
 
 
 
 
 
 
 
 
 
15
  ---
16
 
17
  # Whisper Large — Welsh & English (techiaith/whisper-large-ft-cy-en)
@@ -54,14 +65,14 @@ Total training data: **~177 hours** across **153,066 clips**.
54
 
55
  | Dataset | Language | Duration | Clips | Description |
56
  |---------|----------|----------|-------|-------------|
57
- | [DewiBrynJones/banc-trawsgrifiadau-bangor](https://huggingface.co/datasets/DewiBrynJones/banc-trawsgrifiadau-bangor-2602) | Welsh | 52:45h | 48,569 | Mixed spontaneous & read speech |
58
  | [techiaith/corpws-clllc-wlga](https://huggingface.co/datasets/techiaith/corpws-clllc-wlga) | Welsh | 32:59h | 26,216 | Local government meetings |
59
  | [cymen-arfor/lleisiau-arfor](https://huggingface.co/datasets/cymen-arfor/lleisiau-arfor) | Welsh | 33:54h | 33,614 | Spontaneous conversational speech |
60
  | [techiaith/commonvoice_23_0_cy](https://huggingface.co/datasets/techiaith/commonvoice_23_0_cy) | Welsh | 31:11h | 20,018 | Read speech (CommonVoice 23.0) |
61
  | [techiaith/commonvoice_vad_cy](https://huggingface.co/datasets/techiaith/commonvoice_vad_cy) | Welsh | 3:27h | 8,209 | VAD-segmented clips |
62
  | [techiaith/commonvoice_23_0_en__GB_IE](https://huggingface.co/datasets/techiaith/commonvoice_23_0_en__GB_IE) | English | 22:26h | 16,440 | Read speech, UK/Irish accents (10% sample) |
63
 
64
- Validation: [DewiBrynJones/banc-trawsgrifiadau-bangor](https://huggingface.co/datasets/DewiBrynJones/banc-trawsgrifiadau-bangor-2602) validation split (4:00h, 3,895 clips).
65
 
66
  ## Training Configuration
67
 
@@ -108,4 +119,4 @@ A CTranslate2 (int8 quantised) version is available at [techiaith/whisper-large-
108
 
109
  Developed by [Uned Technolegau Iaith, Prifysgol Bangor / Language Technologies Unit, Bangor University](https://techiaith.bangor.ac.uk/).
110
 
111
- Funded by the Welsh Government.
 
1
  ---
2
  language:
3
+ - cy
4
+ - en
5
  license: apache-2.0
6
  library_name: transformers
7
  pipeline_tag: automatic-speech-recognition
8
  tags:
9
+ - whisper
10
+ - welsh
11
+ - cymraeg
12
+ - speech-recognition
13
+ - translation
14
  base_model: openai/whisper-large-v2
15
+ datasets:
16
+ - techiaith/banc-trawsgrifiadau-bangor
17
+ - techiaith/corpws-clllc-wlga
18
+ - cymen-arfor/lleisiau-arfor
19
+ - techiaith/commonvoice_23_0_cy
20
+ - techiaith/commonvoice_vad_cy
21
+ - techiaith/commonvoice_23_0_en__GB_IE
22
+ - techiaith/commonvoice-23-0-cy-en
23
+ metrics:
24
+ - wer
25
+ - cer
26
  ---
27
 
28
  # Whisper Large — Welsh & English (techiaith/whisper-large-ft-cy-en)
 
65
 
66
  | Dataset | Language | Duration | Clips | Description |
67
  |---------|----------|----------|-------|-------------|
68
+ | [techiaith/banc-trawsgrifiadau-bangor](https://huggingface.co/datasets/techiaith/banc-trawsgrifiadau-bangor) | Welsh | 52:45h | 48,569 | Mixed spontaneous & read speech |
69
  | [techiaith/corpws-clllc-wlga](https://huggingface.co/datasets/techiaith/corpws-clllc-wlga) | Welsh | 32:59h | 26,216 | Local government meetings |
70
  | [cymen-arfor/lleisiau-arfor](https://huggingface.co/datasets/cymen-arfor/lleisiau-arfor) | Welsh | 33:54h | 33,614 | Spontaneous conversational speech |
71
  | [techiaith/commonvoice_23_0_cy](https://huggingface.co/datasets/techiaith/commonvoice_23_0_cy) | Welsh | 31:11h | 20,018 | Read speech (CommonVoice 23.0) |
72
  | [techiaith/commonvoice_vad_cy](https://huggingface.co/datasets/techiaith/commonvoice_vad_cy) | Welsh | 3:27h | 8,209 | VAD-segmented clips |
73
  | [techiaith/commonvoice_23_0_en__GB_IE](https://huggingface.co/datasets/techiaith/commonvoice_23_0_en__GB_IE) | English | 22:26h | 16,440 | Read speech, UK/Irish accents (10% sample) |
74
 
75
+ Validation: [techiaith/banc-trawsgrifiadau-bangor](https://huggingface.co/datasets/DewiBrynJones/banc-trawsgrifiadau-bangor) validation split (4:00h, 3,895 clips).
76
 
77
  ## Training Configuration
78
 
 
119
 
120
  Developed by [Uned Technolegau Iaith, Prifysgol Bangor / Language Technologies Unit, Bangor University](https://techiaith.bangor.ac.uk/).
121
 
122
+ Funded by the Welsh Government.