desbma/GoogleSpeech

Sentences are cut at the beginning and the end with speech.play()

Closed this issue · 10 comments

After installing google-speech, sox, the additional dll and adding the path to the environment variables, when running the most basic example code:

from google_speech import Speech

# say "Hello World"
text = "Hello World"
lang = "en"
speech = Speech(text, lang)
speech.play()

# you can also apply audio effects while playing (using SoX)
# see http://sox.sourceforge.net/sox.html#EFFECTS for full effect documentation
sox_effects = ("speed", "1.5")
speech.play(sox_effects)

# save the speech to an MP3 file (no effect is applied)
speech.save("output.mp3")

The sentence is cut at the beginning and the end. (Half of the first and the last words are never pronounced). Has anyone experienced that? If so, could you please help me solve this problem?

As a remark, this behavior only occurs when running speech.play(). Nevertheless, the output.mp3 is correctly created and when opening it on Windows it doesn't cut any part of the sentence.

Best,
Rkt

REMARK: This happens both in Jupyter Notebook and running it like a .py script.

I can not reproduce this issue.

Are you using the latest SoX version?

You can try to use the pad effect for example pad 0.5 0.5 to add half a second at the beginning and the end of the sound.

Hi,

Thanks for your reply. I reproduced the issue in different Windows machines (still couldn't try on a Linux machine). Nevertheless the pad effect that you propose solves the problem, so for my case that's enough! I was wondering if there is any documentation that explains how to use the effects in Python. I made it work by sending the different arguments of the filter separated by commas i.e ("pad", "0.5", "0.5"), but is that the case for all the commands?

I reproduced the issue in different Windows machines (still couldn't try on a Linux machine).

Unfortunately I don't have any Windows machine to check, but since the audio file is complete I suspect this is either due to your audio setup or maybe a bug in SoX only occurring on Windows.

I made it work by sending the different arguments of the filter separated by commas i.e ("pad", "0.5", "0.5"), but is that the case for all the commands?

Yes that is how it works, I should probably add a word about it in the README and the command line help.

Thank you very much! Could it also be the hand-added libmad DLL? The hyperlink provided on "https://pypi.org/project/google-speech/" is not active anymore. Do you have any reliable source?

Also by the way, I'm trying to implement some speech features for a robot that is sometimes not connected to internet. Nevertheless he is always connected when initializing the classes, therefore I wanted to "download" all the sentences he can say during initialization, but it seems that it only downloads the next one right? Is there any way to force the download of all the Speech objects that are created before playing them?

Thank you very much! Could it also be the hand-added libmad DLL? The hyperlink provided on "https://pypi.org/project/google-speech/" is not active anymore. Do you have any reliable source?

Sorry I don't use Windows so I don't know. Maybe look or ask on the SoX website?

Also by the way, I'm trying to implement some speech features for a robot that is sometimes not connected to internet. Nevertheless he is always connected when initializing the classes, therefore I wanted to "download" all the sentences he can say during initialization, but it seems that it only downloads the next one right? Is there any way to force the download of all the Speech objects that are created before playing them?

Something like this should work:

for segment in speech:
  segment.preLoad()

This happens for me too, on a Debian 10 based system on Raspberry Pi. pad 0 1 fixes it for me, though I can't figure out why. It's also beneficial to increase the MAX_SEGMENT_SIZE in __init__.py to 200 so there aren't cuts mid-sentence (for my use case). (I don't know what the actual maximum size is, but Google is letting me do at least 200.)

Files played normally with the SoX player don't seem to be cut off for me. I have SoX v14.4.2.

@afontenot
Can you provide an example of sentence where the sound is cut?

Sure, although I don't think it matters, because it happens with every sentence.

Tonight, Partly cloudy, with a low around 20. South southwest wind 0 to 15 km/h.

Note that I discovered after posting this that there are some audio issues that seem to have cropped up in the last few months on the Raspberry Pi. This might be related, not GoogleSpeech's fault. https://www.raspberrypi.org/forums/viewtopic.php?f=66&t=240819

Edit: just tried aplay, I see the problem there too. Just hadn't noticed it before because my test files had enough of a pause at the end. Sorry for the noise, probably not this issue. I'm surprised that the Pi folks have let audio be broken for months on their platform though.

The thing is that I currently remove a few ms of audio at the beginning and the end of each file when playing, to avoid unnatural pauses when chaining several segments.
So if the audio playback chain also removes a few hundreds ms, this is more audible because there is less or no audio blank to cut from.