-
Launch Docker container
docker-compose run gpu
-
Fetch XLS-R checkpoint:
wget https://dl.fbaipublicfiles.com/fairseq/wav2vec/xlsr2_300m.pt -P tmp/
-
Download folder of audio into
data
, e.g.data/wav
-
Create manifest of wav folder:
python wav2vec_manifest.py /workspace/data/wav --ext wav --valid-percent 0.01 --dest data/manifest/
-
Adjust training parameters in
w2v2-large-cpt.yaml
(e.g.dataset.max_tokens: 150000
to whatever suits your GPU setup) -
Run continued pretraining:
fairseq-hydra-train task.data=/workspace/data/manifest \ checkpoint.finetune_from_model=/workspace/tmp/xlsr2_300m.pt \ common.log_format=simple \ common.fp16=True \ --config-dir /workspace \ --config-name w2v2-large-cpt.yaml