lucio
/

wav2vec2-large-xlsr-luganda

Automatic Speech Recognition

xlsr-fine-tuning-week

Model card Files Files and versions

lucio commited on Apr 17, 2021

Commit

53ad625

·

1 Parent(s): b390dfa

Update README.md

Files changed (1) hide show

README.md +6 -5

README.md CHANGED Viewed

@@ -23,12 +23,12 @@ model-index:
     metrics:
        - name: Test WER
          type: wer
-         value: 22.82
 ---
 # Wav2Vec2-Large-XLSR-53-lg
-Fine-tuned [facebook/wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53) on Luganda using the [Common Voice](https://huggingface.co/datasets/common_voice) dataset, using train, validation and other (if the example had more upvotes than downvotes), and taking the test data for validation as well as test.
 When using this model, make sure that your speech input is sampled at 16kHz.
 ## Usage
@@ -126,10 +126,11 @@ result = test_dataset.map(evaluate, batched=True, batch_size=8)
 print("WER: {:2f}".format(100 * wer.compute(predictions=result["pred_strings"], references=result["norm_text"])))
 ```
-**Test Result**: 22.82 %
 ## Training
-The Common Voice `train`, `validation` and `other` datasets were used for training, augmented to twice the original size with added noise and manipulated pitch, phase and intensity.
-The script used for training was just the `run_finetuning.py` script provided in OVHcloud's databuzzword/hf-wav2vec image.

     metrics:
        - name: Test WER
          type: wer
+         value: 29.52
 ---
 # Wav2Vec2-Large-XLSR-53-lg
+Fine-tuned [facebook/wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53) on Luganda using the [Common Voice](https://huggingface.co/datasets/common_voice) dataset, using train, validation and other (excluding voices that are in the test set), and taking the test data for validation as well as test.
 When using this model, make sure that your speech input is sampled at 16kHz.
 ## Usage
 print("WER: {:2f}".format(100 * wer.compute(predictions=result["pred_strings"], references=result["norm_text"])))
 ```
+**Test Result**: 29.52 %
 ## Training
+The Common Voice `train`, `validation` and `other` datasets were used for training, excluding voices that are in both the `other` and `test` datasets. The data was augmented to twice the original size with added noise and manipulated pitch, phase and intensity.
+Training proceeded for 60 epochs, on 1 V100 GPU provided by OVHcloud. The `test` data was used for validation.
+The [script used for training](https://github.com/serapio/transformers/blob/feature/xlsr-finetune/examples/research_projects/wav2vec2/run_common_voice.py) is adapted from the [example script provided in the transformers repo](https://github.com/huggingface/transformers/blob/master/examples/research_projects/wav2vec2/run_common_voice.py).