Dynamic input for TTS
robinhad opened this issue · 2 comments
robinhad commented
Hi, thanks for your work on great project!
I tried exporting as specified in example:
from espnet_onnx.export import TTSModelExport
m = TTSModelExport()
tag_name = 'kan-bayashi/ljspeech_vits'
# download with espnet_model_zoo and export from pretrained model
m.export_from_pretrained(tag_name, quantize=True)
Then I tried to use model:
from espnet_onnx import Text2Speech
import IPython
tag_name = 'kan-bayashi/ljspeech_vits'
text2speech = Text2Speech(tag_name, use_quantized=True)
text = 'This model is so small it can be run on a smartwatch! This model is so small it can be run on a smartwatch!'
output_dict = text2speech(text) # inference with onnx model.
wav = output_dict['wav']
IPython.display.Audio(data=wav, rate=22050)
But I get the following message:
2022-10-01 12:31:15.826208114 [E:onnxruntime:, sequential_executor.cc:364 Execute] Non-zero status code returned while running Gather node. Name:'Gather_3861' Status Message: indices element out of data bounds, idx=712 must be within the inclusive range [-512,511]
Traceback (most recent call last):
File "<path>/onnx-tts/use_onnx.py", line 8, in <module>
output_dict = text2speech(text) # inference with onnx model.
File "<path>/onnx-tts/.venv/lib/python3.9/site-packages/espnet_onnx/tts/tts_model.py", line 86, in __call__
output_dict = self.tts_model(text, **options)
File "<path>/onnx-tts/.venv/lib/python3.9/site-packages/espnet_onnx/tts/model/tts_models/vits.py", line 58, in __call__
wav, att_w, dur = self.model.run(output_names, input_dict)
File "<path>/onnx-tts/.venv/lib/python3.9/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 192, in run
return self._sess.run(output_names, input_feed, run_options)
onnxruntime.capi.onnxruntime_pybind11_state.InvalidArgument: [ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Non-zero status code returned while running Gather node. Name:'Gather_3861' Status Message: indices element out of data bounds, idx=712 must be within the inclusive range [-512,511]
I believe it's because inputs are fixed.
I can try to fix this, but maybe do you know how to handle that?
Masao-Someki commented
Hi @robinhad, you can try adding max_length larger than 712 as following:
from espnet_onnx.export import TTSModelExport
m = TTSModelExport()
m.set_export_config(
max_seq_len=1024
)
m.export_from_pretrained(...)
robinhad commented
Thanks a lot!