espnet/espnet_onnx

wav quality drop

1nlplearner opened this issue · 2 comments

hi, I initial text2speech using my own am_model and vocoder and export onnx model, but sound quality drops significantly, I just modify hifigan inference code in https://github.com/Masao-Someki/espnet_onnx/blob/feature/add_PWGVocoder/espnet_onnx/export/tts/models/vocoders/parallel_wavegan.py because hifigan code in repo ParalleWaveGAN does not support parameter x,
and i checked Espnet am and vocoder and onnx am and vocoder, they look the same
could you please offer some advises?

hi, I initial text2speech using my own am_model and vocoder and export onnx model, but sound quality drops significantly, I just modify hifigan inference code in https://github.com/Masao-Someki/espnet_onnx/blob/feature/add_PWGVocoder/espnet_onnx/export/tts/models/vocoders/parallel_wavegan.py because hifigan code in repo ParalleWaveGAN does not support parameter x, and i checked Espnet am and vocoder and onnx am and vocoder, they look the same could you please offer some advises?

when i delete postprocess code in https://github.com/Masao-Someki/espnet_onnx/blob/master/espnet_onnx/tts/tts_model.py ,model can synthesis voice as pytorch inferencing

@1nlplearner
Thank you for reporting this issue.

when i delete postprocess code in https://github.com/Masao-Someki/espnet_onnx/blob/master/espnet_onnx/tts/tts_model.py ,model can synthesis voice as pytorch inferencing

It seems that the normalization process causes this issue. Would you check your config file in ~/.cache/espnet_onnx/<tag_name>/config.yml, and check if the use_normalize is set to False? I think setting the use_normalize: false will fix this problem.