twidddj/tf-wavenet_vocoder

Parallel Wavenet-Vocoder

twidddj opened this issue · 14 comments

Planed TODO

  • KL + Power - Single speaker

Properties not specified in the paper

  • Sampling number for the loss (We may have some limitation for GPU)
  • Number of mixture for IAF layers
  • Averaging method for Power loss
    • ex) Just reduce_mean on time axis or using moving average or ..
  • .. (Please, let us know those)

Another implementations

sadly, there are many details behind the paper, i find nobody can reproduce result.

Most of all, I'm not sure mel-spectrogram would fit well to reproduce result like the linguistic features in the paper. We may have to consider another constraints to make up for the weakness of mel-features.

I do not think mel-features is the key problem of me, i think the iaf and probability Density Distillation is very very import for the quality.

@neverjoe do you tell me how to connect teacher model to eval student !Are they all trained in one sess! thanks

hi ! How to let teachers' network parameters do not participate in training? thanks

xuerq commented

@maozhiqiang
"tf.stop_gradient" in tensorflow

thank you @xuerq

xuerq commented

@neverjoe still working on it,the wav sampled from student is not so good as teacher

@xuerq ,@neverjoe @twidddj how to use teacher model to eval students output? Is it use training process to assessment or using generative process to assessment? thanks!

@twidddj do you get reasonable result about paralle wavenet!

Hi @maozhiqiang, we are still trying to get better result of it. We have got some results, but It's not enough. I attached the results here. Thanks for your attention of our project!

hi @twidddj ! Thank you very much for your reply! do you used KL loss?