dnhkng/GlaDOS

'/dp/flows.7/GatherElements_3' crash

cushycrux opened this issue · 7 comments

in regards to this crash, that happens with other ai voices:

2024-07-02 14:34:51.2481163 [E:onnxruntime:, sequential_executor.cc:516 onnxruntime::ExecuteKernel] Non-zero status code returned while running GatherElements node. Name:'/dp/flows.7/GatherElements_3' Status Message: D:\a\_work\1\s\onnxruntime\core\providers\cpu\tensor\gather_elements.cc:154 onnxruntime::core_impl GatherElements op: Out of range value in index tensor

Exception in thread Thread-2 (process_TTS_thread):
Traceback (most recent call last):
  File "C:\Users\Dave\AppData\Local\Programs\Python\Python312\Lib\threading.py", line 1052, in _bootstrap_inner
    self.run()
  File "C:\Users\Dave\AppData\Local\Programs\Python\Python312\Lib\threading.py", line 989, in run
    self._target(*self._args, **self._kwargs)
  File "F:\GlaDOS-main\glados.py", line 365, in process_TTS_thread
    audio = self._tts.generate_speech_audio(generated_text)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "F:\GlaDOS-main\glados\tts.py", line 310, in generate_speech_audio
    audio_chunk = self._say_phonemes(sentence)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "F:\GlaDOS-main\glados\tts.py", line 405, in _say_phonemes
    audio = self._synthesize_ids_to_raw(phoneme_ids)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "F:\GlaDOS-main\glados\tts.py", line 389, in _synthesize_ids_to_raw
    audio = self.session.run(
            ^^^^^^^^^^^^^^^^^
  File "F:\GlaDOS-main\venv\Lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 220, in run
    return self._sess.run(output_names, input_feed, run_options)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running GatherElements node. Name:'/dp/flows.7/GatherElements_3' Status Message: D:\a\_work\1\s\onnxruntime\core\providers\cpu\tensor\gather_elements.cc:154 onnxruntime::core_impl GatherElements op: Out of range value in index tensor

I found this hint:
just encountered the same issue a downgrade to the 1.17.1 for me resolved this issue seems to be a bug in the 1.18 version
Source: rhasspy/piper#520

Thanks for the bug report, I will investigate!

Win10 / RTX3070
Tested with onnxruntime 1.17.1. It's is rock stable for over 2 hours with a context_length of 7360 (config.yml). But this messages appears on start:

2024-07-02 18:06:04.6651086 [W:onnxruntime:, graph.cc:3593 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs] Removing initializer '620'. It is not used by any node and should be removed from the model.
2024-07-02 18:06:04.6692734 [W:onnxruntime:, graph.cc:3593 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs] Removing initializer '623'. It is not used by any node and should be removed from the model.
2024-07-02 18:06:04.6734261 [W:onnxruntime:, graph.cc:3593 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs] Removing initializer '625'. It is not used by any node and should be removed from the model.
2024-07-02 18:06:04.6778254 [W:onnxruntime:, graph.cc:3593 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs] Removing initializer '629'. It is not used by any node and should be removed from the model.
2024-07-02 18:06:04.6818256 [W:onnxruntime:, graph.cc:3593 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs] Removing initializer '628'. It is not used by any node and should be removed from the model.
2024-07-02 18:06:04.6858567 [W:onnxruntime:, graph.cc:3593 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs] Removing initializer '131'. It is not used by any node and should be removed from the model.
2024-07-02 18:06:04.6898914 [W:onnxruntime:, graph.cc:3593 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs] Removing initializer '134'. It is not used by any node and should be removed from the model.
2024-07-02 18:06:04.6943196 [W:onnxruntime:, graph.cc:3593 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs] Removing initializer '136'. It is not used by any node and should be removed from the model.
2024-07-02 18:06:04.6983800 [W:onnxruntime:, graph.cc:3593 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs] Removing initializer '140'. It is not used by any node and should be removed from the model.
2024-07-02 18:06:04.7023939 [W:onnxruntime:, graph.cc:3593 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs] Removing initializer '139'. It is not used by any node and should be removed from the model.

Wiil test multi voices with speaker_id tomorrow. They crashed almost immediately after starting the app before onnx downgrading.

Great!

I can live with some weird logging message in exchange for a stable system. Make a Pull Request, so you get credit for the fix!

After testing almost 4 hours of piper voices and a wide variation of speaker id's-> https://rhasspy.github.io/piper-samples/ i can say it's 100% stable with onnxruntime 1.17.1.

One Question David, did you use piper to train the voice?

Next task for me would be to find out how to convert pth voices to onnx (which we can do with https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI). The problem is, piper doesn't accept the format, and the Json is missing. So, i have no idea.

Also I don't understand how to "pull request" (looks very complicated). but I don't need any credits at all anyway. So I am fine.

Yes, but it was a while ago. Make the PR for the fix, so you get credit.

I'll try and find the code that does the conversion m

I think its roughly:

import torch


voice_model = torch.jit.load("models/glados.pt")

input_tensor = torch.randint(low=0, high=135, size=(1, 50))


voice_model = torch.jit.trace(voice_model, input_tensor)



output_tensor = voice_model(input_tensor)
output_tensor_values = list(output_tensor.values())  # Extract tensor values from ScriptObject
print()
print(output_tensor.keys())

input_names = ["input"]
output_names = ["output"]
torch.onnx.export(
    voice_model,
    input_tensor,
    "glados.onnx",
    input_names=input_names,
    output_names=output_names, verbose=True)

But it was a while ago! If that doesn't work, let me know and I will dig up and publish the full conversion code, from samples to a final model. Again, I prefer not to use Piper, as its a level of abstraction that's not particularly useful.

Closing this now, seems resolved