Warm start from published model
Opened this issue · 15 comments
Hi, anyone who tested a warm start from published model, can you please share your experience? I aim to train my model in order to improve inference male voice in unseen language. Do we convert model?
Still struggling with this issue. I've tried with converted and the other way with no success.
I've removed iteration and optimizer because published waveglow model doesn't have them. as suggested here.
I've removed some lines for multi gpu. I've issued and closed it just for the records here.
My dataset is >11 hrs male speaker, config.json is untouched other than checkpoint_path.
Still, just smooth noise while inferencing. (The loss turning to positive in some cases)
@sharathadavanne , I see you suggesting to use repo source as it is, do you recall any info that may be useful for me? What is the least iteration I can possibly hear something a little bit comprehensible.
Can you share some audio examples, of a) original, b) synthesized with training waveglow from scratch, and c) synthesized by using warm-start?
So from the attachment and the description above, I am guessing 'waveglow_v5' is synthesized using the model trained from scratch. 'waveglow_train' is synthesized using a warm start?
How long did you train for both the cases a) training from scratch and b)warm-start?
Your training recordings sound robotic, are these real spoken recordings? or are they output of some parametric TTS model? Anyway, I am more curious about the absolute zero-valued silences. This can hurt your waveglow training since it randomly samples a one-second segment from each of your recordings. And in the scenario, this absolute zero-valued segment is chosen it can result in weird loss values (like NaN). Did you observe anything weird in your training curve?
Yes it is a real spoken recording, tho my dataset was converted from mp3. Hope it doesn't affect the result. I've seen loss values gone positive numerous times, but not once observed NaN.
waveglow_v5 version is not the one trained from scratch, sorry for inconvenience. I don't have trained from scratch version I am experimenting to use pre-trained models' behaviour. I've trained tacotron model from warm start with this dataset for 50k iters (250 epochs) and I've got waveglow_v5 version. (where vocoder is just a published pretrained model)
As you mentioned here, I hear vowels exploding or hoarseness as you describe. Later on you suggest it can be corrected by training waveglow plus it is much faster to train starting from the pretrained waveglow model in followups.
The thing is, you said you've trained 500k iterations in 2 days. That is too much gpu power for me.
@sharathadavanne @ksaidin I am training waveglow for a different language and from scratch. Is there a way to warm start from a pretrained waveglow model like v1-v5?
@AnkurDebnath35 in my experience you dont have to worry about language or gender for training waveglow in warm start mode. So go ahead and train in warm start mode.
@AnkurDebnath35 in my experience you dont have to worry about language or gender for training waveglow in warm start mode. So go ahead and train in warm start mode.
I have already started training waveglow from scratch on a Hindi Dataset of length 9k, 870 epochs and the loss has come down to -5.8 to -6.0 in 25K iterations. But what I wanted to ask is how to do a warm start, I couldnt find any parameter for that. Please help
Download the pre-trained waveglow model given in the repo, and update the path of the downloaded model in 'checkpoint_path' variable of the config file.
Thanks a lot @sharathadavanne . By the way, does it help converge faster? I am running on single GPU and training is slow. Although I can run distributed. So, if I warm start and run distributed, how many epochs should be sufficient?
It definitely converges faster! No doubt in that. I haven't timed it, so dont have an estimate.
I have been reading some issues reported here, got to know that atleast 100 or 500 epochs is needed. Can you suggest something?
@AnkurDebnath35 warm-start from the pre-trained Waveglow.
Please post Waveglow relatedissues on the Waveglow repo.
@ksaidin Please share loss curves, predicted mel-spectrograms and alignments.
@AnkurDebnath35 warm-start from the pre-trained Waveglow.
Please post Waveglow relatedissues on the Waveglow repo.
Hello, my idol!
I am also trying to train waveglow from your pretrain on my language, but it does not include optimizer, so I use the default like this:
- optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
The image shows my training process after 1 day. Can you teach me when should I stop, and the number of epochs is enough if I use pretrain?
Thank you <3
Download the pre-trained waveglow model given in the repo, and update the path of the downloaded model in 'checkpoint_path' variable of the config file.
This triggers KeyError 'iteration'. So there must be more steps you haven't mentioned in your comment.