Efficient self-supervised learning (ESSL) model to learn speech representations. The focus is primarily on computational costs, limiting the resources available for pretraining, and evaluating it with ASR as the downstream task.
Create a virtual environment and activate it
sudo apt-get update
sudo apt-get install -y python3-pip python3-dev python3-tk
sudo pip3 install -U virtualenv
virtualenv --system-site-packages -p python3 ~/torch21
source ~/torch21/bin/activate
Now, let's install Python packages
pip3 install --upgrade pip
pip3 install -r requirements.txt
For replicating results, settings are stored in config/efficientssl.py. Experiments use an NVIDIA GeForce RTX 3090 GPU, with 24 GB of RAM. Make sure you have at least 24 GB available in your GPU to avoid out-of-memory exceptions. Otherwise, decrease self.batch_length in config/efficientssl.py and change training steps accordingly.
Download datasets:
python3 main.py --run download
Pretrain and finetune the model for ASR:
python3 main.py --run train
Results are stored in the log file data/marcel_exp0_resutls.pt_log. Checkpoints are stored under data/.
@inproceedings{lugo2024towards,
title={Towards efficient self-supervised representation learning in speech processing},
author={Lugo, Luis and Vielzeuf, Valentin},
booktitle={Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics},
year={2024},
}
This project is licensed under the terms of the MIT license. See the LICENSE file for more information.