rany2/edge-tts

Microsoft Edge's online text-to-speech service cannot be used now.

Closed this issue · 6 comments

edge-tts --text "Hello, world!" --write-media hello.mp3

returns out

aiohttp.client_exceptions.ServerTimeoutError: Connection timeout to host wss://speech.platform.bing.com/consumer/speech/synthesize/readaloud/edge/v1?TrustedClientToken=6A5AA1D4EAFF4E9FB37E23D68491D6F4&ConnectionId=7ab700c0e46443f7b5e4e02db4df6776

Could you please help us to upgrade into a new set of TrustedClientToken and ConnectionId, so that we can have our service in use as usual?
Thank you so much.

I haven't dug into the issue, but as a quick workaround I threw the edge-tts command into a try-catch loop in Python which handles it since the problem doesn't occur every time there. This loop will try {attemps} times to execute the edge-tts and if edge-tts fails it will wait {attempt_interval} seconds then try again:

async def _create_audio_file(text:str, file_name:str, 
                             voice:str = "en-US-SteffanNeural", 
                             speed:str = "+0%",
                             volume:str = "+0%",
                             attemps:int = 10,
                             attempt_interval:int = 1)
                             
    for attempt in range(attempts):
        try:
            communicate = edge_tts.Communicate(text, voice, rate = speed, 
                                           volume=volume)
            await communicate.save(file_name)
        except edge_tts.exceptions.NoAudioReceived as e:
            logger.exception("edge_tts.exceptions.NoAudioReceived Error: %s, retry attempt: %s", e, (attempt+1))
            if attempt == (attempts-1): # 
                logger.exception("Failed to obtain TTS after %s attempts. Max attempts exceeded.", attempts)
                raise e
            # We get here if there was an exception & attempts not exceeded 
            # (didn't reach above else)
            time.sleep(attempt_interval)
            continue
        break

From some quick testing edge-tts seems to fail with edge_tts.exceptions.NoAudioReceived something like 5-10% of the time, but each time it succeeded in the second attempt after waiting 1 second to try again.

Since I have temporarily changed our TTS service into window's imbuilt SAPI.SpVoice, I will take your advice in some time after.
But still, thank you so much for your advice!

I am using

communicate = edge_tts.Communicate(chapter_content, sti)
asyncio.get_event_loop().run_until_complete(communicate.save(output_file))
audio = AudioSegment.from_file(output_file, format="mp3")

And it worked before, now I get

audio = AudioSegment.from_file(output_file, format="mp3")
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "audio_segment.py", line 773, in from_file
    raise CouldntDecodeError(
pydub.exceptions.CouldntDecodeError: Decoding failed. ffmpeg returned error code: 1

Output from ffmpeg/avlib:

ffmpeg version 2023-06-21-git-1bcb8a7338-essentials_build-www.gyan.dev 
...
[in#0 @ 000001785c644c40] Error opening input: Invalid argument

I am using

communicate = edge_tts.Communicate(chapter_content, sti)
asyncio.get_event_loop().run_until_complete(communicate.save(output_file))
audio = AudioSegment.from_file(output_file, format="mp3")

And it worked before, now I get

audio = AudioSegment.from_file(output_file, format="mp3")
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "audio_segment.py", line 773, in from_file
    raise CouldntDecodeError(
pydub.exceptions.CouldntDecodeError: Decoding failed. ffmpeg returned error code: 1

Output from ffmpeg/avlib:

ffmpeg version 2023-06-21-git-1bcb8a7338-essentials_build-www.gyan.dev 
...
[in#0 @ 000001785c644c40] Error opening input: Invalid argument

AudioSegment is not part of edge-tts (looks like it's from jiaaro/pydub?), this problem could definitely be caused by edge-tts not correctly generating an mp3 file but the error message from AudioSegment is not particularly useful here. Though it's strange for edge-tts to not throw an edge_tts.exceptions error that case (usually you would get edge_tts.exceptions.NoAudioReceived if this happened).

So, it would be helpful if you could investigate what is happening before your use of AudioSegment and see what's happening with the edge-tts part. Check the values of: chapter_content and sti to confirm they're valid input for edge_tts.Communicate and confirm communicate is a valid object. Check that output_file is a valid path for your OS. If so, check to confirm output_file was generated and try opening the file with an audio player to confirm if it's valid.
You can do that by putting a breakpoint at your line audio = AudioSegment.from_file(output_file, format="mp3") and checking those values in the debugger at that line, or adding something like the following to a testing branch:
In your imports add: import os
Replace the code you included above:

communicate = edge_tts.Communicate(chapter_content, sti)
asyncio.get_event_loop().run_until_complete(communicate.save(output_file))
audio = AudioSegment.from_file(output_file, format="mp3")

with:

communicate = edge_tts.Communicate(chapter_content, sti)
print(f"chapter content: {chapter_content}\nsti: {sti}\noutput_file: {output_file}")
asyncio.get_event_loop().run_until_complete(communicate.save(output_file))
assert(os.path.isfile(output_file))
audio = AudioSegment.from_file(output_file, format="mp3")

You might also need to run the edge-tts command to list voices & confirm your param is still a valid voice.

The pdf I got the data from happened to be corrupted at the same time Edge seemingly wasn't available, a classic example of corellation mistaken for causation.