Would you please explain the inference process and what does GRAD do?
Closed this issue · 3 comments
youngsuenXMLY commented
Hi, I read the code, but when it comes to the inference part, I get confused about two points:
- Only 1 wav file is fed to the input of inference process. In my opinion, there should be two input wav files - one is source and the other is target. Would you please explain this?
- GRAD is really hard to understand, please give me some guidance.
Looking forward to your reply. Thank you!
marcoppasini commented
Hi!
- You must train the model with only one target style domain (voice or music genre), so at inference you don't need to feed a target sample to the model.
- the GRAD function is a gradient-based method to turn spectrograms back to waveform, which I find to work better that the traditional Griffin-Lim algorithm. You can read the cited paper for more specific info.
CarolinGao commented
Hi!
Have you tried the wavenet vocoder when you turn spectrograms back to waveform?
youngsuenXMLY commented
@CarolinGao hi, I didn't follow this repo any further since this issue, although the MelGAN-VC works well for one to one VC. I need any to many or any to any VC, which is more applicable.