/encodecmae

Codebase for the paper 'EncodecMAE: Leveraging neural codecs for universal audio representation learning'

Primary LanguagePython

EnCodecMAE: Leveraging neural codecs for universal audio representation learning

read the paper run in colab Cite - Bibtex

This is EnCodecMAE, an audio feature extractor pretrained with masked language modelling to predict discrete targets generated by EnCodec, a neural audio codec. For more details about the architecture and pretraining procedure, read the paper.

Updates:

Usage

Feature extraction using pretrained models

Try our example Colab notebook or

1) Clone the EnCodecMAE library:

git clone https://github.com/habla-liaa/encodecmae.git

2) Install it:

cd encodecmae
pip install -e .

3) Extract embeddings in Python:

from encodecmae import load_model

model = load_model('mel256-ec-base_st', device='cuda:0')
features = model.extract_features_from_file('gsc/bed/00176480_nohash_0.wav')

Pretrain your models

1) Install docker and docker-compose in your system. You'll also need to install nvidia-container toolkit to access GPUs from a docker container.

2) Execute the start_docker.sh script

First, docker-compose.yml has to be modified. In the volumes section, change the routes to the ones in your system. You'll need a folder called datasets with the following subfolders:

  • audioset_24k/unbalanced_train
  • fma_large_24k
  • librilight_med_24k

All the audio files need to be converted to a 24kHz sampling rate.

You might also modify the device_ids if you have a different number of gpus.

Then, run:

chmod +x start_docker.sh
./start_docker.sh

This will build the encodecmae image, start a container using docker compose, and attach to it.

3) Install the encodecmae package inside the container

cd workspace/encodecmae
pip install -e .

4) Run the training script

chmod +x scripts/run_pretraining.sh
scripts/run_pretraining.sh

The training script uses my own library for executing pipelines configured with gin: ginpipe. By modifying the config files (with .gin extension), you can control aspects of the training and the model configuration. I plan to explain my approach to ML pipelines, and how to use gin and ginpipe in a future blog article. Stay tuned!