larynx.text_to_speech() function doesn't work
asters-a opened this issue · 1 comments
asters-a commented
Hi there,
I can't get the larynx.text_to_speech python function to work. I'm getting these errors and then the audio that plays is just noise:
2023-04-03 14:51:03.305121301 [W:onnxruntime:, execution_frame.cc:835 VerifyOutputSizes] Expected shape from model of {1,80,244} does not match actual shape of {1,80,234} for output 3453
2023-04-03 14:51:03.325693586 [W:onnxruntime:, execution_frame.cc:835 VerifyOutputSizes] Expected shape from model of {-1,-1,244} does not match actual shape of {1,80,234} for output output
2023-04-03 14:51:03.421273662 [W:onnxruntime:, execution_frame.cc:835 VerifyOutputSizes] Expected shape from model of {-1,1,12800} does not match actual shape of {1,1,59904} for output audio
I know I can do a curl call to the larynx server, and it works properly when I do, but I want to use it without needing to run the server. I want to mention that I did have it working properly with a previous version, when the text_to_speech function didn't require the model and vocoder parameters, but I can't make it work with the newer version.
Can anyone help? Here's my test code:
larynx_model = larynx.load_tts_model(TextToSpeechType.GLOW_TTS, "en-us/southern_english_female-glow_tts")
larynx_vocoder = larynx.load_vocoder_model(VocoderType.HIFI_GAN, "hifi_gan/vctk_small")
audio_settings = larynx.AudioSettings()
tts_result = larynx.text_to_speech(
text="Hello there",
lang="en",
tts_model=larynx_model,
vocoder_model=larynx_vocoder,
audio_settings=audio_settings
)
for result in tts_result:
p = pyaudio.PyAudio()
stream = p.open(
format=p.get_format_from_width(audio_settings.sample_bytes),
channels=audio_settings.channels,
rate=audio_settings.sample_rate,
output=True
)
stream.write(result[1].tobytes())
stream.stop_stream()
stream.close()
p.terminate()
synesthesiam commented
Please take a look at Piper, the successor to Larynx: https://github.com/rhasspy/piper/