T-vK/Termux-DeepSpeech

Recogniticon quality surprisingly bay?

navid-zamani opened this issue · 4 comments

I thought DeepSpeech was a NN-based good model.

I found no details regarding how well this works. I could not get it to recognize things correctly even one time.

The best recognition was also the most funny one: It turned “substance abuse is bad” into “substance the best”. 🤣

So this bug exists, as a request to add a link to the readme, where users can read about DeepSpeech. (And possibly download other speech files, as it may be my accent. 😇)

dp0s commented

Unfortunately the speech recognition quality is bad for me too. It understands single words correctly sometimes, but never a complete sentence.
I don't have this problem with Google or Dicio, there the recognition works fine.

T-vK commented

I agree it's really bad by any modern standard. But since it's developed by Mozilla, I would think that it is just a matter of bad setup/configuration on my part.

Google's speech recognition is proprietary and requires a remote backend. So it is not really comparable imo.

Dicio uses vosk which is comparable to DeepSpeech (open source, works offline). Vosk (at least the way Dicio integrated it) performs much better and yields far better results than Termux-DeepSpeech. Maybe someone should develop Termux-Vosk or something like that.

dp0s commented

Update: I found a way for reliable open source offline voice recognition in Termux.

1 install Sayboard from F-Droid.
2 go to settings and select Sayboard as default Voice Input Method
3 download required vosk models in Sayboard Settings.
4 Use default termux-speech-to-text command.
5 Wait a couple of seconds and then speak slowly.

Step 2 is only possible thanks to the latest Sayboard Update.

It is possible to configure the used language within Sayboard settings.

@dp0s: TermuxActivity just crashes here, when using termux-speech-to-text:

Unable to create service com.termux.api.apis.SpeechToTextAPI$SpeechToTextService: android.view.WindowManager$BadTokenException: Unable to add window -- token null is not valid; is your activity running?

And after that, retrying the command just does nothing until Termux is actually closed and restarted, no matter if I say something or how long I wait. And I have to Ctrl-C it.

The app is enabled as a keyboard. But there seems to be no way to pick a default speech recognition app. Is it possible this is hidden when there is only one? I found no place to pich the default. So I can’t really even tell if Sayboard is actually used.

That being said, recognition works halfway acceptable in Sayboard itself. I’m not sure it is useful with such a tiny dictionary though. I would have to unnaturally speak like a (or to a) small child. It also has big trouble with German compound words, and introduces invalid grammar by separating the words, giving it a very different meaning. (Something common with functional illiterates, that makes one look really stupid, and a bit like a certain kind of radical nationalist too. So you probably understand why that might be a no go. :))
Therr are probably bigger models, but I don’t know how much RAM they will use and if they can even work on an average 2023 phone.

So I don’t think the technology is ready yet. And I’ve used software in the 90s, on a 66MHz 486, that used 64MB RAM and did a better job.