How to inference using MelGAN given a tacotron mel spec output?
OswaldoBornemann opened this issue · 11 comments
When i trained melgan with original wav's mel spec, the result went well.
But when i tried to feed tacotron mel spec output into trained melgan model, the sound just all bee. Would you mind sharing some advice? thanks a lot. @seungwonpark
upload sound samples?
@CookiePPP Please set the volume into lowest... I don't want to hurt your ears...
Do you have the code you used to feed the tacotron outputs into melgan uploaded somewhere?
That's definitely bugged out.
@CookiePPP The process are kind like below:
First i get the mel spec output from tacotron, using like
# mel sent shape is (spec_length, 80)
mel_sent = tacotron_out(model, sentence, CONFIG, use_cuda, ap, use_gl=use_gl, figures=True)
Then i unsqueeze and transpose the mel result to feed into MelGAN.
checkpoint_path = "./melgan/chkpt/id_test1/id_test1_aca5990_0700.pt"
config = "./melgan/config/id_test1.yaml"
checkpoint = torch.load(checkpoint_path)
# if args.config is not None:
# hp = HParam(config)
# else:
hp = load_hparam_str(checkpoint['hp_str'])
melgan_model = Generator(hp.audio.n_mel_channels).cuda()
melgan_model.load_state_dict(checkpoint['model_g'])
melgan_model.eval()
with torch.no_grad():
mel = torch.from_numpy(mel_sent).unsqueeze(0).transpose(2, 1)
mel = mel.cuda()
audio = model.inference(mel)
audio = audio.cpu().detach().numpy()
mel_sent = tacotron_out(model, sentence, CONFIG, use_cuda, ap, use_gl=use_gl, figures=True)
Where does this line come from? This repo is designed to inferface with NVIDIA/Tacotron.
Nvidia uses their own Spectrogram conversion that I believe outputs values between -12 and 2.
@CookiePPP I see. I use mozilla tts instead.
@CookiePPP I would like to know that whether could we use tacotron gta output to train melgan
@tsungruihon
You should be able to scale the output and get an audible result. I don't know what range Mozilla TTS has, but try to transform the Mozilla output to match the Nvidia one.
e.g
mel_sent = tacotron_out(model, sentence, CONFIG, use_cuda, ap, use_gl=use_gl, figures=True)
mel_sent = (mel_sent * 0.5) + 2
and replace 0.5 and +2 with the values that move the spectrogram between -12 and 2.
@CookiePPP I would like to know that whether could we use tacotron gta output to train melgan
Note sure, I'm busy today so I can't really help you there.
@CookiePPP Really appreciated. Thanks a lot.
I face the same problem Did you find a solution?
@tsungruihon
Please visit https://github.com/mozilla/TTS