Why is the audio corresponding to the mel feature needed when synthesizing?

Question

NewEricWang opened this issue 6 years ago · 3 comments

I only want to input the mel-feature generated from tacotron2. How should I modify the script "synthesize.py"?

Answer 1 · 2019-03-05T06:40:47.000Z

Audio size is required and not the audio. You can just replace a line of code ( in def synthesize in synthesize.py ) code with below modified code.

current code -
q_0 = Normal(x.new_zeros(x.size()), x.new_ones(x.size()))

replace the above by
q_0 = Normal(c.new_zeros(1,1,c.size()[2]*256), c.new_ones(1,1,c.size()[2]*256))

256 is the hop_length from preprocessing.py !

Answer 2 · 2019-03-05T08:26:20.000Z

@anupam456 ,Thank you! It works.

Answer 3 · 2019-03-10T09:24:43.000Z

Hi @NewEricWang, Which tacotron2 project do you use?