Update README.md
Browse files
README.md
CHANGED
|
@@ -23,12 +23,12 @@ model-index:
|
|
| 23 |
metrics:
|
| 24 |
- name: Test WER
|
| 25 |
type: wer
|
| 26 |
-
value:
|
| 27 |
---
|
| 28 |
|
| 29 |
# Wav2Vec2-Large-XLSR-53-lg
|
| 30 |
|
| 31 |
-
Fine-tuned [facebook/wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53) on Luganda using the [Common Voice](https://huggingface.co/datasets/common_voice) dataset, using train, validation and other (
|
| 32 |
When using this model, make sure that your speech input is sampled at 16kHz.
|
| 33 |
|
| 34 |
## Usage
|
|
@@ -126,10 +126,11 @@ result = test_dataset.map(evaluate, batched=True, batch_size=8)
|
|
| 126 |
print("WER: {:2f}".format(100 * wer.compute(predictions=result["pred_strings"], references=result["norm_text"])))
|
| 127 |
```
|
| 128 |
|
| 129 |
-
**Test Result**:
|
| 130 |
|
| 131 |
## Training
|
| 132 |
|
| 133 |
-
The Common Voice `train`, `validation` and `other` datasets were used for training, augmented to twice the original size with added noise and manipulated pitch, phase and intensity.
|
|
|
|
| 134 |
|
| 135 |
-
The script used for training
|
|
|
|
| 23 |
metrics:
|
| 24 |
- name: Test WER
|
| 25 |
type: wer
|
| 26 |
+
value: 29.52
|
| 27 |
---
|
| 28 |
|
| 29 |
# Wav2Vec2-Large-XLSR-53-lg
|
| 30 |
|
| 31 |
+
Fine-tuned [facebook/wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53) on Luganda using the [Common Voice](https://huggingface.co/datasets/common_voice) dataset, using train, validation and other (excluding voices that are in the test set), and taking the test data for validation as well as test.
|
| 32 |
When using this model, make sure that your speech input is sampled at 16kHz.
|
| 33 |
|
| 34 |
## Usage
|
|
|
|
| 126 |
print("WER: {:2f}".format(100 * wer.compute(predictions=result["pred_strings"], references=result["norm_text"])))
|
| 127 |
```
|
| 128 |
|
| 129 |
+
**Test Result**: 29.52 %
|
| 130 |
|
| 131 |
## Training
|
| 132 |
|
| 133 |
+
The Common Voice `train`, `validation` and `other` datasets were used for training, excluding voices that are in both the `other` and `test` datasets. The data was augmented to twice the original size with added noise and manipulated pitch, phase and intensity.
|
| 134 |
+
Training proceeded for 60 epochs, on 1 V100 GPU provided by OVHcloud. The `test` data was used for validation.
|
| 135 |
|
| 136 |
+
The [script used for training](https://github.com/serapio/transformers/blob/feature/xlsr-finetune/examples/research_projects/wav2vec2/run_common_voice.py) is adapted from the [example script provided in the transformers repo](https://github.com/huggingface/transformers/blob/master/examples/research_projects/wav2vec2/run_common_voice.py).
|