A versatile, modular AI assistant framework that provides continuous assistance through voice interaction and natural language processing.
- Voice Interaction: Text-to-Speech (TTS) and Speech-to-Text (STT) capabilities for natural interaction
- Modular Architecture: Easily extensible with new capabilities and integrations
- Multiple LLM Support: Compatible with various Large Language Models (Ollama, OpenAI, Anthropic, etc.)
- Customizable Prompts: Configure how the assistant responds with template-based prompting
- Voice Profiles: Customize the assistant's voice with predefined or custom voice profiles
- Logging System: Comprehensive logging with timestamped log files for each session
- macOS Integration: Easy installation with Globe/Fn key shortcut support on MacBook Pro
- Environment Configuration: Simple setup through environment variables and configuration files
- Cross-Platform: Works on macOS, Linux, and Windows
- Python 3.10+
- uv for dependency management (recommended)
- For TTS: speakers or headphones
- For STT: microphone
-
Clone the repository:
git clone https://github.com/yourusername/always-on-ai-assistant.git cd always-on-ai-assistant -
Install dependencies:
uv pip install -r requirements.txt
-
Create a
.envfile based on the sample:cp .env.sample .env
-
Edit the
.envfile to configure your assistant -
For speech recognition with Vosk (optional but recommended for offline use):
# Run the setup script to download and install a Vosk model uv run setup_vosk_model.py # Or specify a different model size uv run setup_vosk_model.py --model medium
For MacBook Pro users who want to trigger the assistant with the Globe/Fn key:
# Run the macOS installer script
python install-to-mac.py
# Or specify a custom voice profile
python install-to-mac.py --voice-profile british
# Or specify a custom wake word
python install-to-mac.py --wake-word "hey computer"The installer will:
- Copy the assistant files to
~/Applications/AlwaysOnAIAssistant - Install required dependencies
- Set up a launch agent to run at login
- Create a keyboard shortcut for the Globe/Fn key
- Install the Vosk model for offline speech recognition
After installation, follow the on-screen instructions to complete the keyboard shortcut setup.
uv run live_tts_demo.py# Using SpeechRecognition (online)
uv run live_stt_demo.py
# Using Vosk (offline)
uv run live_stt_demo.py --engine vosk# Basic usage
uv run voice_assistant_demo.py
# With wake word activation
uv run voice_assistant_demo.py --wake-word "hey assistant"
# With specific engines
uv run voice_assistant_demo.py --stt-engine vosk --tts-engine gtts
# With verbose logging
uv run voice_assistant_demo.py --verboseThe assistant supports customizable voice profiles that define the voice characteristics:
To see all available system voices:
uv run voice_assistant_demo.py --list-voicesThe assistant comes with several predefined voice profiles:
# List available voice profiles
uv run voice_assistant_demo.py --list-profiles
# Use a specific voice profile
uv run voice_assistant_demo.py --voice-profile britishYou can create custom voice profiles by adding JSON files to the voices directory. See VOICES.md for details.
Example voice profiles:
# pyttsx3 voice profile example
{
"name": "My Custom Voice",
"description": "Custom voice with specific settings",
"engine": "pyttsx3",
"voice_id": "com.apple.voice.compact.en-US.Samantha",
"rate": 150,
"volume": 0.9,
"language": "en-US"
}
# gTTS voice profile example (default engine)
{
"name": "My gTTS Voice",
"description": "Google Text-to-Speech voice profile",
"engine": "gtts",
"language": "en-us",
"tld": "com",
"slow": false
}Note: gTTS is now the default TTS engine due to better reliability and to avoid the "run loop not started" error that can occur with pyttsx3.
The assistant includes a comprehensive logging system that records all activities:
- Logs are stored in the
logsdirectory - Each session creates a timestamped log file (e.g.,
voice_assistant_2025-03-03_00-51-23.log) - Logs include information about:
- Voice profile and engine settings
- Speech recognition events
- LLM queries and responses
- Errors and warnings
To enable verbose logging:
uv run voice_assistant_demo.py --verboseThe assistant supports two speech recognition engines:
- Uses Google's speech recognition API by default
- Requires an internet connection
- High accuracy
- No setup required
- Offline speech recognition
- Privacy-focused (no data sent to external servers)
- Requires downloading a model
- See VOSK_MODELS.md for more information
| Variable | Description | Default |
|---|---|---|
ASSISTANT_PROMPT_TEMPLATE |
Template for LLM prompts | Basic helpful assistant template |
TTS_ENGINE |
Text-to-speech engine | gtts |
TTS_VOICE_ID |
Voice ID for TTS | System default |
TTS_RATE |
Speech rate (words per minute) | 150 |
TTS_VOLUME |
Speech volume (0.0 to 1.0) | 1.0 |
TTS_LANGUAGE |
Language code for gTTS | en |
LLM_MODEL_TYPE |
Type of LLM to use | ollama |
LLM_MODEL_NAME |
Name of the LLM model | mistral:instruct |
LLM_BASE_URL |
Base URL for the LLM API | http://localhost:11434 |
The assistant scripts support various command-line arguments:
--verbose: Enable verbose output and detailed logging
--tts-engine: TTS engine to use (pyttsx3orgtts)--tts-voice: Voice ID for pyttsx3--tts-rate: Speech rate in words per minute--tts-volume: Speech volume (0.0 to 1.0)--voice-profile: Use a predefined voice profile--list-voices: List available system voices--list-profiles: List available voice profiles
--stt-engineor--engine: STT engine to use (speechrecognitionorvosk)--language: Language code for speech recognition--vosk-model-path: Path to the Vosk model directory--wake-word: Wake word to activate the assistant (e.g., "hey assistant")
--model: LLM model to use--model-type: Type of LLM to use--base-url: Base URL for the LLM API
--voice-profile: Voice profile to use (default: default)--model: LLM model to use (default: mistral:instruct)--wake-word: Wake word to activate the assistant (default: "hey assistant")--install-dir: Installation directory (default: ~/Applications/AlwaysOnAIAssistant)
always-on-ai-assistant/
├── .env.sample # Sample environment variables
├── README.md # This file
├── VOSK_MODELS.md # Information about Vosk models
├── install-to-mac.py # macOS installation script
├── main.py # Main entry point
├── live_tts_demo.py # Text-to-speech demo
├── live_stt_demo.py # Speech-to-text demo
├── voice_assistant_demo.py # Complete voice assistant demo
├── setup_vosk_model.py # Script to download Vosk models
├── requirements.txt # Project dependencies
├── logs/ # Directory for log files
├── models/ # Directory for Vosk models
├── voices/ # Voice profiles
│ ├── VOICES.md # Documentation for voice profiles
│ ├── default.json # Default voice profile
│ ├── british.json # British voice profile
│ └── technical.json # Technical voice profile
├── ai_docs/ # AI documentation
├── commands/ # Command implementations
├── images/ # Project images
├── layers/ # Architectural layers
│ ├── __init__.py
│ ├── speech_input_layer.py # Speech recognition layer
│ ├── output_layer.py # Output layer (including TTS)
│ └── ... # Other layers
├── modules/ # Core modules
│ ├── __init__.py
│ ├── assistant_config.py # Configuration handling
│ ├── base_assistant.py # Base assistant implementation
│ ├── data_types.py # Data type definitions
│ ├── execute_python.py # Python execution utilities
│ ├── ollama.py # Ollama integration
│ ├── query_helper.py # LLM query helper
│ ├── typer_agent.py # Typer CLI agent
│ └── utils.py # Utility functions
├── prompts/ # Prompt templates
└── tests/ # Test suite
├── __init__.py
├── .env.test # Test environment variables
├── test_helper.py # Test utilities
├── test_speech_input_layer.py # Speech input tests
└── ... # Various test modules
The assistant uses a modular architecture that makes it easy to add new voice capabilities:
- For new TTS engines, extend the
TextToSpeechOutputLayerclass - For new STT engines, extend the
SpeechInputLayerclass
Create new command modules in the commands/ directory following the template pattern.
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License - see the LICENSE file for details.