/wav2vec-2.0

Wav2vec2 English speech recognition in PaddlePaddle

Primary LanguagePython

Wav2vec2 in PaddlePaddle

This is paddle-paddle version of Facebook's Wav2vec2.0 [1], with code and pre-trained weighted ported from Fairseq and huggingface.

Dependency

Install PaddlePaddle 2.0.1

pip install PaddlePaddle-gpu==2.0.1

Install PaddleAudio by

git clone https://github.com/PaddlePaddle/models
cd models/PaddleAudio
pip install -e .

Supported configs

name Finetuning split Dataset
wav2vec2-base-960h 960h Librispeech
wav2vec2-large-960h 960h Librispeech
wav2vec2-base-960h-lv60 960h Librispeech + Libri-Light
wav2vec2-base-960h-lv60-self 960h Librispeech + Libri-Light + Self Training

Quickstart

Clone the project,

git clone https://github.com/ranchlai/wav2vec2.paddle
cd wav2vec2.paddle

Run the speech recognition test with your audio file,

python test.py --device "gpu:0" --audio "LJ001-0186.wav" --config "wav2vec2-large-960h-lv60"

If successful, you will see output like this,

pred==> position of our society that a work of utility might be also a work of art if we cared to make it so

If you do not have gpu or run out of gpu memory, try cpu:

python test.py --device "cpu" --audio "LJ001-0186.wav" --config "wav2vec2-large-960h-lv60"

Reference

[1] Baevski, Alexei, et al. “Wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations.” Advances in Neural Information Processing Systems, vol. 33, 2020, pp. 12449–12460.