(note that it doesn't differentiate i'm
from um
very well, so I flagged i'm
)
- Clone repo
- Download the speech recognition model and save to
um_detector
directory (rename model download as "model"): Here's a setup tutorial and the model's alphacephei api. - Edit the bad words in run.py (or leave as is)
- Run
pip install vosk; pip install sounddevice
in terminal - Run run.py
The reason for choosing this api is because it can process words in realtime (offline). To be able to get feedback on your presentation/zoom call, offline speech recognition is required. Other packages like SpeechRecognition are too slow and don't categorize "um" as a word.
- Make the i'm vs. um distinction more robust
- Add feedback on tone shift
- Provide post talk summary of your presentation (most/least frequent words, total time not speaking, etc.)
Any ideas/modifications/comments are welcome.