The Assistant project is a sophisticated voice assistant application that leverages real-time speech-to-text (STT) and text-to-speech (TTS) technologies. It is designed to provide seamless voice interaction capabilities, including wake word detection, voice activity detection, and real-time transcription while also leverage the scripting actions of mac operating systems.
- Voice Activity Detection (VAD): Automatically starts/stops recording based on the presence of speech.
- Wake Word Detection: Initiates recording when a specific wake word is detected.
- Real-Time Transcription: Provides immediate transcription of spoken words using the faster_whisper library.
- Event Callbacks: Customizable callbacks for various events such as recording start/stop, transcription updates, and wake word detection.
- Integration with Spotify: Controls Spotify playback, including play, pause, and volume adjustments.
- OpenInterpreter This is the brain of the system. Implemented in this codebase.
- Configuration Management: Handles loading and storing configuration settings.
- Logging: Provides logging utilities for debugging and monitoring.
- Console Utilities: Functions for printing markdown and clearing the console.
- Audio Utilities: Functions for finding input devices and handling audio processing.
- Process Utilities: Functions for creating daemon processes.
- AudioToTextRecorder: Main class for handling audio recording, VAD, wake word detection, and transcription.
- Transcription: Uses faster_whisper for converting audio to text.
- Callbacks: Supports various callbacks for handling different states and events during recording and transcription.
Important
The AudioToTextRecorder class and its functionalities in audio_recorder.py
are implemented by Kolja Beigel. Just small modifications were made to support my requirements.
- FastAPI Server: Provides endpoints for receiving text and speech commands.
- WebSocket Listener: Handles real-time communication with the client.
- Push-to-Talk Listener: Listens for a specific key press to start/stop recording.
- DBEventHandler: Monitors changes in the notification database and triggers scripts based on specific events.
- Clone the repository:
git clone https://github.com/g3ar-v/assistant.git
- Install dependencies:
poetry install
Configuration settings are managed through a combination of default and user-specific configuration files. The main configuration file is device.conf
, which can be found in the configuration directory. The
Run the server:
assistant
or run the development loop:
assistant dev
This project is licensed under the Apache License, Version 2.0. See the LICENSE file for more details.
For any inquiries or support, please contact the me at vfranktor@gmail.com