/s3lspeech

Self-supervised approach for learning speech representations in a sustainable way

Primary LanguagePythonMIT LicenseMIT

Installation

A docker file based on NVidia 23.04 image is available under docker/Dockerfile, with all the dependencies and libraries for running experiments. Otherwise, follow a manual install:

Create a virtual environment and activate it

sudo apt-get update
sudo apt-get install -y python3-pip python3-dev python3-tk
sudo pip3 install -U virtualenv
virtualenv --system-site-packages -p python3 ~/torch21
source ~/torch21/bin/activate

Install Python packages

pip3 install --upgrade pip
pip3 install -r requirements.txt

Experiments

For replicating results, settings are stored in config/s3lspeech.py.

Download datasets

python3 main.py --run download

Pretrain the model

python3 main.py --run pretrain

Finetune the model for ASR

python3 main.py --run finetune

Results are stored in the log s3lspeech_results.pt_log. The pretrained and finetuned checkpoints are stored under data/.

License

This project is licensed under the terms of the MIT license. See the LICENSE file for more information.