A simple collection of scripts that provide speech-to-text functionality across the desktop. Press a hotkey to start recording, press the hotkey again to stop recording, and have your speech rendered speech rendered to high-quality text quickly and effortlessly.
I wrote this project in an hour or so in an attempt to experiment with the OpenAI API and some OSX automation APIs in my free time.
This project almost kinda works, but it's not great. I have it set up such that I can trigger recording with a hotkey
combination (Ctrl-Shift-7
) and can trigger stop-recording with the same combination, the resulting audio clip is then
sent to the OpenAI Whisper API where it's transcribed, and the transcription is then written through a virtual keyboard
to whatever program I currently have open.
However, OSX's protections mean that any program you have open when you run that automation needs both microphone access and Accessibility API access, and providing those can be a pain. It's also a bit slow to begin recording, so it's not clear when you should start talking after pressing the hotkeys.
I'm basically giving up on this for now, but it does mostly work.
Enter instructions here for building and running
These scripts require:
Brew
portaudio
:$ brew install portaudio
(required forpyaudio
)
Python Modules
pyautogui
: to create keystokes in the GUI from a string of textopenai
: to send audio to the OpenAI Whipser APIpyaudio
: for capturing audio from the active microphoneopenai>=0.27.0
: transcribing audio to textpsutil
: finding an actively running process
With the release of Whipser, speech-to-text is good enough that it should be ubiquitous. However, most users of the Apple ecosystem are still forced to suffer through Siri's crappy speech-to-text functionality. This is unacceptable.
I created these scripts so that I could have access to world-class speech-to-text functionality anywhere on my desktop at anytime.
Ideally this would run entirely locally using the WhisperCpp project. However, at the time I created this project I was running on Intel silicon and inference is too slow to reasonably make use of.