Good pre-trained weights anyone?

First, thank you very much @r9y9 and everyone for the great work!

Does anyone want to share pre-trained weights that sound good?

Particularly for LJSpeech if possible. My training is to be converging to a very high loss value. I would love to experiment with some sounds, and maybe figure out where I am going wrong in training.

Thanks in advance,
Duvte.

@duvtedudug i have trained ljspeech for 340k, this is the link ljspeech_340k_pth

https://www.dropbox.com/s/8qgcbd1mm2xsqgq/20180127_mixture_lj_checkpoint_step000410000_ema.pth?dl=0

Weights used to generate speech for https://r9y9.github.io/wavenet_vocoder/

@azraelkuan @r9y9
Thank you both very much!

@r9y9 Thanks for the checkpoint! Would it be possible to share the multispeaker checkpoint as well?
Thanks!

https://www.dropbox.com/s/d0qk4ow9uuh2lww/20180212_mixture_multispeaker_cmu_arctic_checkpoint_step000740000_ema.pth?dl=0

Here you are:) This is also the one used for the demo page.

@r9y9 Could you show us the parameters/configurations for this checkpoint(20180212_mixture_multispeaker_cmu_arctic_checkpoint_step000740000_ema.pth). I tried to generate voices with this one, while the results were not as good as what you published.

My command line is as following:
python synthesis.py checkpoint.pth --hparams="input_type=raw,gin_channels=16" --speaker-id=5

One of my results is as following:

generated.zip

@mfkfge Sounds like there's mismatch between mel-spectrogram and speaker ID. Did you use mel-spectrogram of speaker ID 5?

@r9y9 yes. i did try with mel-spectrogram of speaker id 5 as well as that of speaker id 6.

Oh, I see the problem. Can you try with --hparams="input_type=raw,gin_channels=16.sample_rate=16000"? sample_rate is 16kHz for CMU ARCTIC.

@r9y9 Thanks! It turns good with "sample_rate=16000".

I'm also having trouble generating good sound, even with sample_rate=16000.

My cmd line is:
python synthesis.py 20180212_mixture_multispeaker_cmu_arctic_checkpoint_step000740000_ema.pth ./ --conditional=cmu_arctic-mel-00001.npy --preset=20180212_multispeaker_cmu_arctic_mixture.json --symmetric-mels --speaker-id 5 --hparams="sample_rate=16000"

Am I doing the right thing?

@mfkfge Sounds like there's mismatch between mel-spectrogram and speaker ID. Did you use mel-spectrogram of speaker ID 5?

@r9y9 could you explain how to match mel-spectrogram with speaker ID ? each speaker ID have its specific mel-spectrogram?

@zctang See train.txt in your preprocessed data directory. It should contain speaker ID in the last column. See also

wavenet_vocoder/cmu_arctic.py

Line 126 in e8f10dd

return (audio_filename, mel_filename, timesteps, text, speaker_id)

@zctang See train.txt in your preprocessed data directory. It should contain speaker ID in the last column. See also

wavenet_vocoder/cmu_arctic.py

Line 126 in e8f10dd

return (audio_filename, mel_filename, timesteps, text, speaker_id)

OK, I see. Thank you.

@skyw did you pre-process the CMU dataset to generate cmu_arctic-mel-00001.npy even when you are using the pre-trained model?