espeak-ng/espeak-ng-ios-app

Noise artifacts between batches of text

parhamdoustdar opened this issue · 41 comments

Right now, when VoiceOver speaks text that is in batches, there is a very short chunk of the last text it spoke in the middle.

This might be hard to explain, so let me share the steps to reproduce;

  1. With the espeak voice selected, lock your phone
  2. Swipe right until you land on "Show notifications" button
  3. Notice that you can hear a small part of the "s" in between "show notifications" and "button"

Let me know if this is only happening for me, so I can share an audio/screen recording.

@parhamdoustdar I can't duplicate this with the language set to Persian and the voice set to Max. I remember having encountered this issue on Windows few months ago with other ESpeak voices which are rather echoey, but Max seems to be unaffected.

Interesting - I'm also using 16.1 but can't duplicate it.

That does it. I can reproduce it with 80% and beyond. I was using 65%.

Yeah well I can reproduce this too and I somehow think it might be related to #6. It seems like some buffer is not flushed properly, just a wild guess of mine though.

I mean what is interesting here is that it mostly occurs when we have a bunch of text where VoiceOver then adds 5S pause and then it sends another bunch of text, which happens of course without having it interrupt the first one. It also happens when you listen to a string till the end and then to a second one. from what I could estimate now (however gotta add that I only had the iPhone's internal speakers so far) this does not occur if we swipe through stuff very fast so that we always interrupt the ongoing string purposefully. So this would confirm my theory.

Maybe it's related, I hear weird clicks when moving really fast, or even normally.
It's so annoying that I stopped using espeak.

Can you check 1.0(6)?

@djphoenix This is still duplicable with 1.0.6.

Oh... I will review it more precisely.

I am having the issue, I can go up to 76% using VoiceOver but at 77% and above the audio skips.
Furthermore, the pronunciation of some things changes when the faster rates are used, vowels seem more enunciated. The word rate sounds more like "raaate" with the A lasting longer at faster rates.

Also there's a click played about a second after speech naturally stops on some voices.
Using the latest beta 06.

It's interesting that in the eSpeak app, the voice works properly at faster speech rates. But with VoiceOver it has the choppy problem.

Good news that in next version I will share an settings like word rate and pitch between app and VoiceOver, so you can just leave VoiceOver rate to default and tune speed natively for eSpeak core.

Hello,
I think that if you are planning to do this, you should give us a setting to choose whether we want to be able to change the rate from VoiceOver or from the app.
There are advantages to both approaches.

With VoiceOver, blind people can change the rate quickly on the fly with the rotor gesture, so it's possible to make quick adjustments as you are reading specific things.
However, on the other hand, the app offers much faster speeds, which is also quite benefitial.

The key problem is that VoiceOver adjusts speed via resampling an audio record, on the other side, eSpeak changes rate on synthesis stage. So eSpeak rate change is more clear. Unfortunately I haven't found a way to read VoiceOver setting to handle it properly in audio unit.

In the end you of course may change word rate in eSpeak, then adjust rate with VoiceOver (to lower or higher) and it will apply after synthesis as well.

Ah, got it, if that is how it will work this is completely fine, it works the same in the Android app, i.e. you can adjust the speech rate in the app too, but also via the screen reader.

I wrongly understood that the only way to change the rate will be via the app.

Not released yet. I waiting for one PR merged in espeak-ng to fix #16.

Please check 1.0(7) for iOS

@djphoenix for me, with the latest release, now ESpeak doesn't appear in VoiceOver at all.

It happened in the past as well, but usually restarting the phone or opening the app again would fix it, but not this time.

Oh... I'm not sure how to fix it for everyone. For sure, I tried both to update app from previous app, and clean install - and for me it works for both cases.

Can you try completely remove an app, then install it from testflight, open it, and then check VoiceOver voice list?

Never mind, it just appeared. It seems to take a long time sometimes for ESpeak to appear after updating, but at least it works.

From some quick testing, about the original issue reported here, this seems to be much better, but you have to set the rate in the app to about 460.
If I do this, no matter what I configure in VoiceOver as the rate, I don't experience this issue anymore.

@parhamdoustdar I think I should ask you as author of this thread - does this fixed for you now? Seems that latest build is not affected.

@paxcoder does it reproducing in app and/or console tool? You can send an example phrase and settings so I may check it.

@paxcoder increase the speech rate in the app to at least 460, and if this is too fast for you, decrease it using VoiceOver's rotor afterwards. When I do this, the last phoneme is no longer added to the next spoken utterance.
Can you reproduce it if you do this as well?

@djphoenix normally this can't be reproduced in the app, it happens only with VoiceOver, and even then, you need to have at least 2 different strings announced. For example, if VoiceOver says something like, rate, and you move to the next option which is pitch, you will first hear the last phoneme of the rate word which was previously spoken for a very short moment, and then the new word which is pitch.
This happens only if VoiceOver is set to rate 80 or above, and seemingly as I said doesn't happen anymore if the app rate is 460 or above, in that case any rate with VoiceOver works fine.

A slight correction, this appears to work after 450, and not 460. So, 449 rate causes the issue, but after 450 it's fine.

Oh gosh... @nidza07 I think I can reproduce it now. That was unreproduceable for me because I was never raised word rate over 300wpm, it becomes absolutely not understandable for me. I really doesn't know how blind people uses screen readers with 400 wpm and over...

OK, now I know that here is a range where bug appears. The upper bound is 449, and the lower is over 300. The question is why...

If I'm understanding this correctly, it looks like this is a bug in eSpeak:

The desired rate (e.g. the one set in VO) comes through in the SSML as a percentage. Example:

<speak><prosody rate="274.46155%"><break time="60ms" />            Untitled 8 <break time="60ms" />            window</prosody></speak>

eSpeak takes that and sets the rate based on the current rate.
Rates above 449 activate sonic. My guess is there's a difference when activating it that way vs calling eSpeak functions.

This is easy to reproduce with just espeak-ng, running with espeak -m:

<speak><prosody rate="274.46155%"><break time="60ms" />            Untitled 8 <break time="60ms" />            window</prosody></speak>
<speak><prosody rate="274.46155%"><break time="1000ms" /> test</prosody></speak>

As soon as you hit enter on the second line, you'll hear a bit of audio at the beginning.

@tspivey thank you a lot for your sample!

It really clear now that there is a some issue in core library. I will track it (and maybe fix it myself, haha).

Should be fixed in 1.0(9) for iOS and 1.0(4) for macOS. @XP-Fan @tspivey @nidza07 please check.

@paxcoder it was a tricky road haha. We are on a finishing straight.

I confirm that this bug is fixed.