This is a project I have made built around Mozillas DeepSpeech speech recognition engine.
This should work on both Windows x86_64 and Linux x86_64 but will not work on RPi OS 64 bit.
It is called Guinan based off of the Star Trek character, Guinan.
Deepspeech is a speech recognition engine that is built on top of Google's TensorFlow framework.
It allows for local (offline) speech recognition, so you don't have to connect to an online API to perform decent speech recognition.
I have made this so that I can integrate into other projects such as my upcoming T-800 project. My previous projects Nvidianator and EDITH glasses used wit.ai to perform speech recognition; which is effective; but of course requires API keys and internet access.
Install the requirements with:
pip install -r requirements.txt
Run guinan/utils/model_dowloader.py to get the pre-trained conformer model.
Then you can run integrate_stt.py and seeing if it can translate speech to text (ensure you have a microphone).
Once cloned out from github, you can run the following command:
If you want to integrate Guinan into another system, such as a robot - you can pull down the repo into the folder of the project you are working on, then append the system path of the Guinan folder to the system path of the project.
guinan_dir = os.path.join( path_to_guinan )
sys.path.append(guinan_dir)
You will also need to install everything in requirements.txt and/or add the requirements to your project's requirements.txt You can then go into the folder guinan/utils and run model_dowloader.py to get the pre-trained conformer model.
Then you can run a test by running integrate_stt.py and seeing if it can translate speech to text (ensure you have a microphone).
Then this can be integrated into another program by importing run_stt_inference from integrate_stt.py:
from guinan.integrate_stt import run_stt_inference
Which can then be called from the program to record audio and get the text output.