Neural voices do not work
marlon-br opened this issue · 5 comments
Hi,
I use python and TranslationRecognizer to recognize, translate and synthesize. All works OK for standard voices, but for neural voices I get error:
Audio cancelled: CancellationDetails(reason=CancellationReason.Error, error_details="Synthesis service failed with code: - Could not identify the voice 'en-US-AriaNeural' for the text to speech service ...
This is for 'eastus' region, so I expect it should support neural voices. Any ideas why that happens?
I used 'eastus' region with the voice name 'en-US-AriaNeural' in the minimal Python code sample, it works well.
Please show your code. It does not work on my side and I want to understand why it happens.
My code is next:
import azure.cognitiveservices.speech as speechsdk
from azure.cognitiveservices.speech.audio import AudioStreamFormat, AudioConfig
from threading import Event
speech_key, service_region = 'key', 'eastus'
fromLanguage = 'de-DE'
toLanguage = 'en'
# set SpeechTranslationConfig
translation_config = speechsdk.translation.SpeechTranslationConfig(subscription=speech_key, region=service_region)
translation_config.speech_recognition_language = fromLanguage
translation_config.add_target_language(toLanguage)
translation_config.voice_name = "en-US-AriaNeural"
# set AudioStreamFormat
audioFormat = AudioStreamFormat(22050, 16, 1)
custom_push_stream = speechsdk.audio.PushAudioInputStream(stream_format=audioFormat)
# set TranslationRecognizer
audio_config = AudioConfig(stream=custom_push_stream)
recognizer = speechsdk.translation.TranslationRecognizer(translation_config=translation_config,
audio_config=audio_config)
#event to track recognition completeness
synthesis_done = Event()
# callbacks
def synthesis_callback(evt):
size = len(evt.result.audio)
if size > 0:
t_sound_file = open("output.wav", "wb+")
t_sound_file.write(evt.result.audio)
t_sound_file.close()
def recognized_complete(evt):
if evt.result.reason == speechsdk.ResultReason.TranslatedSpeech:
print("RECOGNIZED '{}': {}".format(fromLanguage, evt.result.text))
print("TRANSLATED into {}: {}".format(toLanguage, evt.result.translations[toLanguage]))
def cancelled_callback(evt):
print(f'Audio cancelled: {evt.cancellation_details}')
synthesis_done.set()
# set callbacks
recognizer.synthesizing.connect(synthesis_callback)
recognizer.recognized.connect(recognized_complete)
recognizer.canceled.connect(cancelled_callback)
# read input and send to recognizer
open_audio_file = open("input.wav", 'rb')
file_bytes = open_audio_file.read()
custom_push_stream.write(file_bytes)
custom_push_stream.close()
# process audio
recognizer.start_continuous_recognition()
synthesis_done.wait()
recognizer.stop_continuous_recognition()
It does not work for "en-US-AriaRUS" too. But works for "en-US-ZiraRUS" etc. I use paid plan, so the issue is not connected with trial limitations.
I refer to the very simple code in this repo https://github.com/Azure-Samples/Cognitive-Speech-TTS/tree/master/Samples-Http/Python.
Your error could be speech SDK problem (which is in https://github.com/Azure-Samples/cognitive-services-speech-sdk) but I'm not sure. Perhaps you can try use the basic https post and see if the same issue happens.
If I use sample code from https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/get-started-speech-translation
I have got the same result. "de-DE-Hedda" works, "en-US-AriaNeural" does not work.
To understand that I add:
def cancelled_callback(evt):
print(f'Audio cancelled: {evt.cancellation_details}')
recognizer.canceled.connect(cancelled_callback)
into the code
I refer to the very simple code in this repo https://github.com/Azure-Samples/Cognitive-Speech-TTS/tree/master/Samples-Http/Python.
Your error could be speech SDK problem (which is in https://github.com/Azure-Samples/cognitive-services-speech-sdk) but I'm not sure. Perhaps you can try use the basic https post and see if the same issue happens.
I tried the sample you mentioned and it works for the required voices.
I think this is the issue: Azure-Samples/cognitive-services-speech-sdk#673