NVIDIA/mellotron

Adding another speaker

JakubReha opened this issue · 5 comments

I am trying to train the pre-trained model LibriTTS with one more speaker. I've added around 15 minutes of audio from this speaker to the train-clean-100 dataset, added the transcription in 85:15 ratio (train:validation) and increased the number of speakers to 124 in hparams.py. Also all the audio files were resampled to 22 050 Hz, 16 bit. But when I run the inference on the checkpoints I get only noise for all the speakers.
Screenshot 2020-05-08 at 12 47 44
Screenshot 2020-05-08 at 12 48 10

Check that the files are definitely 16 bit and have similar volume to the other speakers.
The Source Mel should have more detail. Like this

They are 16 bit and the volume of the extra speaker is slightly higher, but the thing is that the source mel is still the same regardless of the speaker.

@JakubReha
The audio file path for the source mel is printed here. Would you be able to upload and/or check that audio file?
paste

@JakubReha The mel-spectrogram looks suspicious. Can you share an audio file?

Closing due to inactivity.