AppleHolic/source_separation

how to load the model? what is the model name?

Closed this issue · 4 comments

I can't seem to find the correct name to load the singing voice separation model, I tried almost all combinations from settings.py and other places... for example:

python synthesize.py separate test.wav ./ refine_spectrogram_unet ./

but it doesn't work with refine_spectrogram_unet or refine_unet_larger.

So I am not sure how to load the model, it is basically stuck at the model = build_model(model_name).cuda() line.

I also placed the singing voice separation model in the same folder called by default from Google Drive RefineSpectrogramUnet.best.chkpt

Any help on this? Thanks!

@danielkorg Basically solve this issue, 'register_model_architecture' and 'register_model' functions must be called before calling 'build_model' . I just add this line to solve that issue, and I couldn't find other way.

By the way, for running it, I have a plan to add colab notebook (#19 ) for inference their models.

I managed to get it working! But now my question is about the output wav file:

  1. why is it mono output only? Isn't singing voice separation trained on DSD100 supposed to be in stereo? Is there an easy fix for this somewhere in the code or does this need brand new stereo model?

  2. In a typical source separation scenario, especially with singing vocal separation, it would be nice
    to also generate the accompaniment (music only). Is there an easy fix for this also somewhere in the code?

Thank you! Fantastic work so far! :)

@danielkorg

  1. Ah the reason is simple. I didn't think to make singing voice separation at first time, and then I wanna build simply singing voice separation by using original code. I think also that it can get better result using stereo channels, but I have not a plan to build them in this repository.

  2. Actually this repository is focused only extracting single track in music case. Because, it uses the loss for reducing noise. Output channels can be changed to two channels that one is the vocal track and the other one is other tracks, I think loss term can be changed for two channels.
    But, I don't think that their work won't give us good results that make a difference original own. Because, noise is already used on loss term. I think it effects on model, data and loss term. So it is cannot be dealt simply first case.

I think this project has limitations. So, I will make more efficient separation model in other work after the end of more important tasks to me :).

Close this issue. If you have another issue, open another issue or send me an email