A Mel-Band-Roformer Vocal model. This model performs slightly better than the paper equivalent due to training with more data.
Download the model - https://huggingface.co/KimberleyJSN/melbandroformer/blob/main/MelBandRoformer.ckpt
Install requirements - pip install -r requirements.txt
Inference - python inference.py --config_path configs/config_vocals_mel_band_roformer.yaml --model_path melbandroformer.ckpt --input_folder songsfolder --store_dir outputsfolder
The model will output a vocals and instrumental file for every .wav file inside the --input_folder and save them to the --store_dir folder.
num_overlap - Increasing this value can improve the quality of the outputs due to helping with artifacts created when putting the chunks back together. This will make inference times longer (you don't need to go higher than 8)
chunk_size - The length of audio input into the model (default is 352800 which is 8 seconds, 352800 was also used to train the model)
lucidrains for implementing the paper (and all his open source work)
ZFTurbo for releasing training code which was used to train the model
Ju-Chiang Wang, Wei-Tsung Lu, Minz Won (ByteDance AI Labs) - The authors of the Mel-Band RoFormer paper
aufr33 + Anjok + bascurtiz for helping contribute to the dataset
If you would like to contribute GPU access for me to train better models please contact me at KimberleyJensenOx1@gmail.com. A 40GB GPU will be required due to high VRAM usage. You can also contribute by adding to my dataset which is used to train the model.
https://colab.research.google.com/drive/1tyP3ZgcD443d4Q3ly7LcS3toJroLO5o1?usp=sharing