espnet/espnet_onnx

Dynamic input for TTS

robinhad opened this issue · 2 comments

Hi, thanks for your work on great project!

I tried exporting as specified in example:

from espnet_onnx.export import TTSModelExport

m = TTSModelExport()

tag_name = 'kan-bayashi/ljspeech_vits'
# download with espnet_model_zoo and export from pretrained model
m.export_from_pretrained(tag_name, quantize=True)

Then I tried to use model:

from espnet_onnx import Text2Speech
import IPython

tag_name = 'kan-bayashi/ljspeech_vits'
text2speech = Text2Speech(tag_name, use_quantized=True)

text = 'This model is so small it can be run on a smartwatch! This model is so small it can be run on a smartwatch!'
output_dict = text2speech(text) # inference with onnx model.
wav = output_dict['wav']
IPython.display.Audio(data=wav, rate=22050)

But I get the following message:

2022-10-01 12:31:15.826208114 [E:onnxruntime:, sequential_executor.cc:364 Execute] Non-zero status code returned while running Gather node. Name:'Gather_3861' Status Message: indices element out of data bounds, idx=712 must be within the inclusive range [-512,511]
Traceback (most recent call last):
  File "<path>/onnx-tts/use_onnx.py", line 8, in <module>
    output_dict = text2speech(text) # inference with onnx model.
  File "<path>/onnx-tts/.venv/lib/python3.9/site-packages/espnet_onnx/tts/tts_model.py", line 86, in __call__
    output_dict = self.tts_model(text, **options)
  File "<path>/onnx-tts/.venv/lib/python3.9/site-packages/espnet_onnx/tts/model/tts_models/vits.py", line 58, in __call__
    wav, att_w, dur = self.model.run(output_names, input_dict)
  File "<path>/onnx-tts/.venv/lib/python3.9/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 192, in run
    return self._sess.run(output_names, input_feed, run_options)
onnxruntime.capi.onnxruntime_pybind11_state.InvalidArgument: [ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Non-zero status code returned while running Gather node. Name:'Gather_3861' Status Message: indices element out of data bounds, idx=712 must be within the inclusive range [-512,511]

I believe it's because inputs are fixed.
I can try to fix this, but maybe do you know how to handle that?

Hi @robinhad, you can try adding max_length larger than 712 as following:

from espnet_onnx.export import TTSModelExport

m = TTSModelExport()
m.set_export_config(
    max_seq_len=1024
)
m.export_from_pretrained(...)

Thanks a lot!