Code with implementation of models aiming to invert EnCodecMAE features back to the waveform domain.
We provide pretrained weights for many of our models, and this colab demonstrates how to play around with them.
Model Name | Upstream | Summary | Training Data | Model Type |
---|---|---|---|---|
ecmae2ec-base-1LTransformer | EnCodecMAE Base | None | AS + LL + FMA | Regressor |
DiffTransformerAE2L8L1CLS-10s | EnCodecMAE Base | 10s | FMA + Jamendo | Diffusion |
DiffTransformerAE2L8L1CLS-4s | EnCodecMAE Base | 4s | FMA | Diffusion |
For training follow these steps:
- Gather training datasets and put them in a folder. The datasets should have a sampling rate of 24 kHz.
- Install docker and docker-compose.
- Clone this repository and also this one
- Edit the docker-compose file. Modify the paths in volumes so that they point to: encodecmae repository, this repository, and the folder with the datasets. These folders will appear in the docker container inside the /workspace folder. Update the device_ids according to the gpus that you want to use inside the container for training.
- Update the paths in the configs/datasets as needed
- Inside this repository folder run:
docker compose up -d
docker attach encodecmae-to-wav-train
- An interactive shell will open. Run
cd /workspace/encodecmae
pip install -e .
cd /workspace/encodecmae-to-wav
pip install -e .
- Check that the datasets appear in /workspace/datasets
- Navigate to /workspace/encodecmae-to-wav/encodecmae-to-wav
- Run
chmod +x scripts/train.sh
- In scripts/train.sh you will find a list of commands, each corresponding to a different experiment. Comment everything except the experiment to be ran. The batch size and other parameters can be modified in the --mods argument or by editing this config
- Run scripts/train.sh and it should start training.
If you use this code or results in your paper, please cite our work as:
@article{alonso2024leveraging,
title={Leveraging pre-trained autoencoders for interpretable prototype learning of music audio},
author={Alonso Jim{\'e}nez, Pablo and Pepino, Leonardo and Batlle-Roca, Roser and Zinemanas, Pablo and Serra, Xavier and Rocamora, Mart{\'\i}n},
year={2024},
publisher={Institute of Electrical and Electronics Engineers (IEEE)}
}
@article{pepino2023encodecmae,
title={EnCodecMAE: Leveraging neural codecs for universal audio representation learning},
author={Pepino, Leonardo and Riera, Pablo and Ferrer, Luciana},
journal={arXiv preprint arXiv:2309.07391},
year={2023}
}