Parallel Wavenet-Vocoder
twidddj opened this issue · 14 comments
Planed TODO
- KL + Power - Single speaker
Properties not specified in the paper
- Sampling number for the loss (We may have some limitation for GPU)
- Number of mixture for IAF layers
- Averaging method for Power loss
- ex) Just reduce_mean on time axis or using moving average or ..
- .. (Please, let us know those)
Another implementations
- https://github.com/zhf459/P_wavenet_vocoder (used r9y9's wavenet in pytorch)
sadly, there are many details behind the paper, i find nobody can reproduce result.
Most of all, I'm not sure mel-spectrogram would fit well to reproduce result like the linguistic features in the paper. We may have to consider another constraints to make up for the weakness of mel-features.
I do not think mel-features is the key problem of me, i think the iaf and probability Density Distillation is very very import for the quality.
@neverjoe do you tell me how to connect teacher model to eval student !Are they all trained in one sess! thanks
hi ! How to let teachers' network parameters do not participate in training? thanks
@maozhiqiang
"tf.stop_gradient" in tensorflow
thank you @xuerq
@neverjoe still working on it,the wav sampled from student is not so good as teacher
@twidddj do you get reasonable result about paralle wavenet!
Hi @maozhiqiang, we are still trying to get better result of it. We have got some results, but It's not enough. I attached the results here. Thanks for your attention of our project!
hi @twidddj ! Thank you very much for your reply! do you used KL loss?