A model for de-autotuning (reverse autotuning)

Question

A model for de-autotuning (reverse autotuning)

francqz31 opened this issue a year ago · 1 comments

Hello Authors , I just thought about it , but it would be nice to have a model to de-autotune vocals that would be really awesome!!
I myself can train it if you don't , but I would need some info or guide on how to train it using this repository . and how much data is needed to make a sufficient well trained model that would bring eye-catching results. and how many training steps is needed too. and how many hours it would take on an A100.

thanks in advance.

Answer 1 · 2023-09-22T04:49:59.000Z

Hi francqz31,
This is a neat idea! However, I think de-autotuning vocals would require a different architecture and approach to our models in this project. The main models that we used are designed to do filtering in the time/frequency domain to achieve their results, however, you would likely need a model that could perform audio generation to reproduce the de-autotuned vocals. Therefore, I think this task is out of the scope of this project. I'd recommend checking out diffusion models for this purpose, such as this project: https://github.com/archinetai/audio-diffusion-pytorch or this project: https://github.com/eloimoliner/CQTdiff