Azure-Samples/Cognitive-Speech-TTS

Neural voices do not work

marlon-br opened this issue · 5 comments

Hi,

I use python and TranslationRecognizer to recognize, translate and synthesize. All works OK for standard voices, but for neural voices I get error:
Audio cancelled: CancellationDetails(reason=CancellationReason.Error, error_details="Synthesis service failed with code: - Could not identify the voice 'en-US-AriaNeural' for the text to speech service ...

This is for 'eastus' region, so I expect it should support neural voices. Any ideas why that happens?

I used 'eastus' region with the voice name 'en-US-AriaNeural' in the minimal Python code sample, it works well.

Please show your code. It does not work on my side and I want to understand why it happens.

My code is next:

import azure.cognitiveservices.speech as speechsdk
from azure.cognitiveservices.speech.audio import AudioStreamFormat, AudioConfig
from threading import Event

speech_key, service_region = 'key', 'eastus'

fromLanguage = 'de-DE'
toLanguage = 'en'

# set SpeechTranslationConfig
translation_config = speechsdk.translation.SpeechTranslationConfig(subscription=speech_key, region=service_region)
translation_config.speech_recognition_language = fromLanguage
translation_config.add_target_language(toLanguage)
translation_config.voice_name = "en-US-AriaNeural"

# set AudioStreamFormat
audioFormat = AudioStreamFormat(22050, 16, 1)
custom_push_stream = speechsdk.audio.PushAudioInputStream(stream_format=audioFormat)

# set TranslationRecognizer
audio_config = AudioConfig(stream=custom_push_stream)
recognizer = speechsdk.translation.TranslationRecognizer(translation_config=translation_config,
                                                         audio_config=audio_config)

#event to track recognition completeness
synthesis_done = Event()

# callbacks
def synthesis_callback(evt):
    size = len(evt.result.audio)
    if size > 0:
        t_sound_file = open("output.wav", "wb+")
        t_sound_file.write(evt.result.audio)
        t_sound_file.close()


def recognized_complete(evt):
    if evt.result.reason == speechsdk.ResultReason.TranslatedSpeech:
        print("RECOGNIZED '{}': {}".format(fromLanguage, evt.result.text))
        print("TRANSLATED into {}: {}".format(toLanguage, evt.result.translations[toLanguage]))


def cancelled_callback(evt):
    print(f'Audio cancelled: {evt.cancellation_details}')
    synthesis_done.set()

# set callbacks
recognizer.synthesizing.connect(synthesis_callback)
recognizer.recognized.connect(recognized_complete)
recognizer.canceled.connect(cancelled_callback)

# read input and send to recognizer
open_audio_file = open("input.wav", 'rb')
file_bytes = open_audio_file.read()
custom_push_stream.write(file_bytes)
custom_push_stream.close()

# process audio
recognizer.start_continuous_recognition()
synthesis_done.wait()
recognizer.stop_continuous_recognition()

It does not work for "en-US-AriaRUS" too. But works for "en-US-ZiraRUS" etc. I use paid plan, so the issue is not connected with trial limitations.

I refer to the very simple code in this repo https://github.com/Azure-Samples/Cognitive-Speech-TTS/tree/master/Samples-Http/Python.

Your error could be speech SDK problem (which is in https://github.com/Azure-Samples/cognitive-services-speech-sdk) but I'm not sure. Perhaps you can try use the basic https post and see if the same issue happens.

If I use sample code from https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/get-started-speech-translation
I have got the same result. "de-DE-Hedda" works, "en-US-AriaNeural" does not work.
To understand that I add:

def cancelled_callback(evt):
    print(f'Audio cancelled: {evt.cancellation_details}')

recognizer.canceled.connect(cancelled_callback)

into the code

I refer to the very simple code in this repo https://github.com/Azure-Samples/Cognitive-Speech-TTS/tree/master/Samples-Http/Python.

Your error could be speech SDK problem (which is in https://github.com/Azure-Samples/cognitive-services-speech-sdk) but I'm not sure. Perhaps you can try use the basic https post and see if the same issue happens.

I tried the sample you mentioned and it works for the required voices.

I think this is the issue: Azure-Samples/cognitive-services-speech-sdk#673