Andong-Li-speech/GaGNet

[Question] Which GGMs's output should we take after inferencing of the model?

MIBlue119 opened this issue · 2 comments

Sorry to bother you and thank for your opening of the model.
I try to train the model and meet a little problem.
After training the model, I would get length of three list from model (GGMs, default=3)

If I want to execute istft to sythesize the result, which index's inferenced tensor should i choose?
Thanks!

I try to process the output like below.
The sound would be better but with little noise like phase is not completely correct.

estimated_stft = esti_list[-1]
estimated_stft_mag, estimated_stft_phase = torch.norm(estimated_stft, dim=1)**2, torch.atan2(estimated_stft[:,-1,...], estimated_stft[:,0,...])
estimated_stft = torch.stack((estimated_stft_mag*torch.cos(estimated_stft_phase), estimated_stft_mag*torch.sin(estimated_stft_phase)), dim=1)