Fine-Tuning-of-XLSR-Wav2Vec2-on-Arabic

  • This repository is part of my participation in Hugging Face Fine Tuning week of XLRS Wav2Vec2 on Common Voice Corpus 4 Arabic dataset.

  • The mini_arabic.ipynb notebook contains all data preprocessing and training steps.

  • The evaluation.ipynb notebook contains testing steps.

Useful Links:

  • Download the model from Hugging Face model hub https://huggingface.co/anas/wav2vec2-large-xlsr-arabic

  • Download the Common Voice datasethttps://commonvoice.mozilla.org/en/datasets

  • Sprint announcement https://discuss.huggingface.co/t/open-to-the-community-xlsr-wav2vec2-fine-tuning-week-for-low-resource-languages/4467

  • Additional info about the event https://github.com/huggingface/transformers/blob/master/examples/research_projects/wav2vec2/FINE_TUNE_XLSR_WAV2VEC2.md

  • Preprocessing the transcriptions https://github.com/saobou/arabic-text-preprocessing/blob/master/Preprocess.ipynb