Would you please explain the inference process and what does GRAD do?

Question

Would you please explain the inference process and what does GRAD do?

Closed this issue 5 years ago · 3 comments

youngsuenXMLY commented 5 years ago

Hi, I read the code, but when it comes to the inference part, I get confused about two points:

Only 1 wav file is fed to the input of inference process. In my opinion, there should be two input wav files - one is source and the other is target. Would you please explain this?
GRAD is really hard to understand, please give me some guidance.
Looking forward to your reply. Thank you!

Answer 1 · 2020-02-27T11:43:28.000Z

Hi!

You must train the model with only one target style domain (voice or music genre), so at inference you don't need to feed a target sample to the model.
the GRAD function is a gradient-based method to turn spectrograms back to waveform, which I find to work better that the traditional Griffin-Lim algorithm. You can read the cited paper for more specific info.

Answer 2 · 2020-06-30T14:37:42.000Z

Hi!
Have you tried the wavenet vocoder when you turn spectrograms back to waveform?

Answer 3 · 2020-07-02T02:01:09.000Z

@CarolinGao hi, I didn't follow this repo any further since this issue, although the MelGAN-VC works well for one to one VC. I need any to many or any to any VC, which is more applicable.