⚠️ Important Notice: This repository is provided as-is, without active maintenance or support. While the code is functional, I cannot provide fixes or updates. Users are welcome to fork the repository and make their own modifications. Pull requests are welcome but must be thoroughly tested and documented.
A real-time speech-to-text dictation tool for Linux, powered by Whisper models (local or cloud), with support for ElevenLabs TTS and Ollama LLM integration. While primarily tested on Fedora 40 (the distribution used by Linus Torvalds himself), it should theoretically work on any Linux distribution with the proper dependencies installed.
demo-2024-12-15.14-57-18.mp4
Watch the demo above to see Linux Dictation in action, featuring real-time speech-to-text, voice commands, and AI-powered text improvements.
- Real-time speech-to-text conversion using Whisper
- Text-to-Speech capabilities via ElevenLabs
- LLM-powered chat mode and text improvement using Ollama
- Support for multiple Whisper models
- Voice activity detection for improved accuracy
- Automatic text insertion into active window using ydotool or xdotool
- Configurable voice commands and ignored phrases
- Multiple operation modes: dictation, chat, and proofreading
- Linux (primarily tested on Fedora 40)
- Python 3.11 or higher
- PortAudio development libraries
- ydotool or xdotool for text input
- NVIDIA GPU (optional, for GPU acceleration)
- Ollama (optional, for LLM features)
- ElevenLabs API key (optional, for TTS features)
- Whisper instance (can be local Docker container, OpenAI API, or any compatible endpoint)
-
Clone the repository:
git clone https://github.com/mysticaltech/linux_dictation.git cd linux_dictation -
Install system dependencies:
sudo dnf install python3-pip python3-devel portaudio-devel ydotool xdotool
-
Install Poetry (if not already installed):
curl -sSL https://install.python-poetry.org | python3 - -
Install project dependencies:
poetry install
-
Configure environment variables:
cp .env.example .env # Edit .env with your API keys and preferences
WHISPER_MODEL: Choose the Whisper model (default: "faster-distil-whisper-large-v3")WHISPER_BASE_URL: Whisper API endpoint (can be local Docker container, OpenAI API, or any compatible service)ELEVENLABS_API_KEY: Your ElevenLabs API keyELEVENLABS_VOICE_ID: Your chosen ElevenLabs voice IDOLLAMA_API_URL: Ollama API endpointOLLAMA_MODEL: Your chosen Ollama modelOLLAMA_TIMEOUT: API timeout in seconds
-
Start the dictation service:
./start.sh
Or manually:
poetry run python main.py
-
Available voice commands:
- "pause dictation" - Pause transcription
- "resume dictation" - Resume transcription
- "chat mode" - Switch to interactive LLM chat mode
- "dictation mode" - Switch to standard dictation mode
- "read aloud" - TTS reading of selected text
- "make awesome" - Improve selected text using LLM
-
Operation Modes:
- Dictation Mode: Standard speech-to-text
- Chat Mode: Interactive conversations with LLM
- Proofreading Mode: Text improvement and suggestions
-
Press Ctrl+C in the terminal to stop the application.
- Requires ElevenLabs API key
- Supports reading selected text aloud
- Configurable voice and model settings
- Requires Ollama installation
- Supports chat mode for interactive conversations
- Text improvement and proofreading capabilities
- Primary: ydotool for Wayland support
- Fallback: xdotool for X11 compatibility
-
Audio Input Issues:
- Check microphone settings in system settings
- Verify microphone permissions
- Test microphone with
pavucontrol
-
Text Input Problems:
- Check ydotool service status
- Verify xdotool installation
- Check input method compatibility
-
LLM/TTS Issues:
- Verify API keys in .env
- Check Ollama service status
- Confirm network connectivity
Contributions are welcome! Please feel free to submit issues and pull requests.
This project is licensed under the MIT License - see the LICENSE file for details.