Whisper Desktop is an Electron-based application that allows users to transcribe speech to text using OpenAI's Whisper model through the Groq API. It provides a simple interface for recording audio and automatically transcribing it into text, which can then be inserted into any active text input field.
Tl;dr With the magic that is Whisper and the speed of the Groq servers, I thought I'd spend a weekend to make a tool to help me speak globally into my computer.
- Global hotkey (Ctrl+Shift+Space) to start/stop recording
- Real-time audio recording using the system microphone
- Transcription of recorded audio using the Whisper large-v3 model
- Automatic insertion of transcribed text into the active text input field
- System tray integration for easy access
-
Clone the repository:
git clone https://github.com/yourusername/whisper-desktop.git cd whisper-desktop
-
Install dependencies:
npm install
-
Create a
.env
file in the root directory and add your Groq API key:GROQ_API_KEY=your_api_key_here
To obtain your Groq API key, visit https://console.groq.com/keys.
-
Start the application:
npm start
-
Press
Ctrl+Shift+Space
(orCmd+Shift+Space
on macOS) to start recording -
Speak into your microphone
-
Press
Ctrl+Shift+Space
again to stop recording and initiate transcription -
The transcribed text will be automatically inserted into the active text input field
- You may need to grant permission for the app to access your microphone. If prompted, allow microphone access in System Preferences > Security & Privacy > Privacy > Microphone.
-
Ensure you have the necessary audio libraries installed. On Ubuntu or Debian-based systems, you might need to run:
sudo apt-get install libasound2-dev
-
If you encounter issues with global shortcuts, you may need to install
libxtst-dev
:sudo apt-get install libxtst-dev
The main components of the application are:
main.js
: The main Electron processrenderer.js
: The renderer process handling the UI and recording logicpreload.js
: Exposes Electron APIs to the renderer processindex.html
: The main application window
To modify the application:
- Make changes to the relevant files
- Restart the application to see the changes
To build the application for distribution:
npm run build
This will create distributable packages for your platform in the dist
folder.
If you encounter any issues with audio recording or transcription:
- Ensure your microphone is properly connected and selected as the default input device
- Check the console logs for any error messages
- Verify that your Groq API key is correctly set in the
.env
file - You may need to install Java if you don't already have it installed: https://www.java.com/download/ie_manual.jsp
The following are ideas for future development:
- Allow for using different providers (e.g. OpenAI or self-host)
- Include post-processing capability using LLM providers
- Make the UI more customizable
- Add a feature to save and manage transcription history
- Develop a mobile companion app for remote control and syncing
- Implement advanced audio processing for noise reduction and speaker separation
This was done in a weekend, so I don't have any specific plans to implement any of these yet.
This project is licensed under the Apache License 2.0. See the LICENSE file for details.