encodecmae-to-wav: A Python repository from mrpep

Code with implementation of models aiming to invert EnCodecMAE features back to the waveform domain.

Inference

We provide pretrained weights for many of our models, and this colab demonstrates how to play around with them.

Model Name	Upstream	Summary	Training Data	Model Type
ecmae2ec-base-1LTransformer	EnCodecMAE Base	None	AS + LL + FMA	Regressor
DiffTransformerAE2L8L1CLS-10s	EnCodecMAE Base	10s	FMA + Jamendo	Diffusion
DiffTransformerAE2L8L1CLS-4s	EnCodecMAE Base	4s	FMA	Diffusion

Training

For training follow these steps:

Gather training datasets and put them in a folder. The datasets should have a sampling rate of 24 kHz.
Install docker and docker-compose.
Clone this repository and also this one
Edit the docker-compose file. Modify the paths in volumes so that they point to: encodecmae repository, this repository, and the folder with the datasets. These folders will appear in the docker container inside the /workspace folder. Update the device_ids according to the gpus that you want to use inside the container for training.
Update the paths in the configs/datasets as needed
Inside this repository folder run:

docker compose up -d
docker attach encodecmae-to-wav-train

An interactive shell will open. Run

cd /workspace/encodecmae
pip install -e .
cd /workspace/encodecmae-to-wav
pip install -e .

Check that the datasets appear in /workspace/datasets
Navigate to /workspace/encodecmae-to-wav/encodecmae-to-wav
Run chmod +x scripts/train.sh
In scripts/train.sh you will find a list of commands, each corresponding to a different experiment. Comment everything except the experiment to be ran. The batch size and other parameters can be modified in the --mods argument or by editing this config
Run scripts/train.sh and it should start training.

Citation

If you use this code or results in your paper, please cite our work as:

@article{alonso2024leveraging,
  title={Leveraging pre-trained autoencoders for interpretable prototype learning of music audio},
  author={Alonso Jim{\'e}nez, Pablo and Pepino, Leonardo and Batlle-Roca, Roser and Zinemanas, Pablo and Serra, Xavier and Rocamora, Mart{\'\i}n},
  year={2024},
  publisher={Institute of Electrical and Electronics Engineers (IEEE)}
}

@article{pepino2023encodecmae,
  title={EnCodecMAE: Leveraging neural codecs for universal audio representation learning},
  author={Pepino, Leonardo and Riera, Pablo and Ferrer, Luciana},
  journal={arXiv preprint arXiv:2309.07391},
  year={2023}
}

mrpep/encodecmae-to-wav

Inference

Training

Citation