Always-On AI Assistant

A versatile, modular AI assistant framework that provides continuous assistance through voice interaction and natural language processing.

Features

Voice Interaction: Text-to-Speech (TTS) and Speech-to-Text (STT) capabilities for natural interaction
Modular Architecture: Easily extensible with new capabilities and integrations
Multiple LLM Support: Compatible with various Large Language Models (Ollama, OpenAI, Anthropic, etc.)
Customizable Prompts: Configure how the assistant responds with template-based prompting
Voice Profiles: Customize the assistant's voice with predefined or custom voice profiles
Logging System: Comprehensive logging with timestamped log files for each session
macOS Integration: Easy installation with Globe/Fn key shortcut support on MacBook Pro
Environment Configuration: Simple setup through environment variables and configuration files
Cross-Platform: Works on macOS, Linux, and Windows

Getting Started

Prerequisites

Python 3.10+
uv for dependency management (recommended)
For TTS: speakers or headphones
For STT: microphone

Installation

Standard Installation

Clone the repository:

git clone https://github.com/yourusername/always-on-ai-assistant.git
cd always-on-ai-assistant

Install dependencies:
```
uv pip install -r requirements.txt
```
Create a .env file based on the sample:
```
cp .env.sample .env
```
Edit the .env file to configure your assistant

For speech recognition with Vosk (optional but recommended for offline use):

# Run the setup script to download and install a Vosk model
uv run setup_vosk_model.py

# Or specify a different model size
uv run setup_vosk_model.py --model medium

macOS Installation with Globe/Fn Key Support

For MacBook Pro users who want to trigger the assistant with the Globe/Fn key:

# Run the macOS installer script
python install-to-mac.py

# Or specify a custom voice profile
python install-to-mac.py --voice-profile british

# Or specify a custom wake word
python install-to-mac.py --wake-word "hey computer"

The installer will:

Copy the assistant files to ~/Applications/AlwaysOnAIAssistant
Install required dependencies
Set up a launch agent to run at login
Create a keyboard shortcut for the Globe/Fn key
Install the Vosk model for offline speech recognition

After installation, follow the on-screen instructions to complete the keyboard shortcut setup.

Running the Assistant

Text-to-Speech Demo

uv run live_tts_demo.py

Speech-to-Text Demo

# Using SpeechRecognition (online)
uv run live_stt_demo.py

# Using Vosk (offline)
uv run live_stt_demo.py --engine vosk

Complete Voice Assistant

# Basic usage
uv run voice_assistant_demo.py

# With wake word activation
uv run voice_assistant_demo.py --wake-word "hey assistant"

# With specific engines
uv run voice_assistant_demo.py --stt-engine vosk --tts-engine gtts

# With verbose logging
uv run voice_assistant_demo.py --verbose

Voice Profiles

The assistant supports customizable voice profiles that define the voice characteristics:

Listing Available Voices

To see all available system voices:

uv run voice_assistant_demo.py --list-voices

Using Voice Profiles

The assistant comes with several predefined voice profiles:

# List available voice profiles
uv run voice_assistant_demo.py --list-profiles

# Use a specific voice profile
uv run voice_assistant_demo.py --voice-profile british

Creating Custom Voice Profiles

You can create custom voice profiles by adding JSON files to the voices directory. See VOICES.md for details.

Example voice profiles:

# pyttsx3 voice profile example
{
  "name": "My Custom Voice",
  "description": "Custom voice with specific settings",
  "engine": "pyttsx3",
  "voice_id": "com.apple.voice.compact.en-US.Samantha",
  "rate": 150,
  "volume": 0.9,
  "language": "en-US"
}

# gTTS voice profile example (default engine)
{
  "name": "My gTTS Voice",
  "description": "Google Text-to-Speech voice profile",
  "engine": "gtts",
  "language": "en-us",
  "tld": "com",
  "slow": false
}

Note: gTTS is now the default TTS engine due to better reliability and to avoid the "run loop not started" error that can occur with pyttsx3.

Logging System

The assistant includes a comprehensive logging system that records all activities:

Logs are stored in the logs directory
Each session creates a timestamped log file (e.g., voice_assistant_2025-03-03_00-51-23.log)
Logs include information about:
- Voice profile and engine settings
- Speech recognition events
- LLM queries and responses
- Errors and warnings

To enable verbose logging:

uv run voice_assistant_demo.py --verbose

Speech Recognition

The assistant supports two speech recognition engines:

SpeechRecognition

Uses Google's speech recognition API by default
Requires an internet connection
High accuracy
No setup required

Vosk

Offline speech recognition
Privacy-focused (no data sent to external servers)
Requires downloading a model
See VOSK_MODELS.md for more information

Configuration Options

Environment Variables

Variable	Description	Default
`ASSISTANT_PROMPT_TEMPLATE`	Template for LLM prompts	Basic helpful assistant template
`TTS_ENGINE`	Text-to-speech engine	`gtts`
`TTS_VOICE_ID`	Voice ID for TTS	System default
`TTS_RATE`	Speech rate (words per minute)	`150`
`TTS_VOLUME`	Speech volume (0.0 to 1.0)	`1.0`
`TTS_LANGUAGE`	Language code for gTTS	`en`
`LLM_MODEL_TYPE`	Type of LLM to use	`ollama`
`LLM_MODEL_NAME`	Name of the LLM model	`mistral:instruct`
`LLM_BASE_URL`	Base URL for the LLM API	`http://localhost:11434`

Command Line Arguments

The assistant scripts support various command-line arguments:

Common Arguments

--verbose: Enable verbose output and detailed logging

Text-to-Speech Arguments

--tts-engine: TTS engine to use (pyttsx3 or gtts)
--tts-voice: Voice ID for pyttsx3
--tts-rate: Speech rate in words per minute
--tts-volume: Speech volume (0.0 to 1.0)
--voice-profile: Use a predefined voice profile
--list-voices: List available system voices
--list-profiles: List available voice profiles

Speech-to-Text Arguments

--stt-engine or --engine: STT engine to use (speechrecognition or vosk)
--language: Language code for speech recognition
--vosk-model-path: Path to the Vosk model directory
--wake-word: Wake word to activate the assistant (e.g., "hey assistant")

LLM Arguments

--model: LLM model to use
--model-type: Type of LLM to use
--base-url: Base URL for the LLM API

macOS Installer Arguments

--voice-profile: Voice profile to use (default: default)
--model: LLM model to use (default: mistral:instruct)
--wake-word: Wake word to activate the assistant (default: "hey assistant")
--install-dir: Installation directory (default: ~/Applications/AlwaysOnAIAssistant)

Project Structure

always-on-ai-assistant/
├── .env.sample                # Sample environment variables
├── README.md                  # This file
├── VOSK_MODELS.md             # Information about Vosk models
├── install-to-mac.py          # macOS installation script
├── main.py                    # Main entry point
├── live_tts_demo.py           # Text-to-speech demo
├── live_stt_demo.py           # Speech-to-text demo
├── voice_assistant_demo.py    # Complete voice assistant demo
├── setup_vosk_model.py        # Script to download Vosk models
├── requirements.txt           # Project dependencies
├── logs/                      # Directory for log files
├── models/                    # Directory for Vosk models
├── voices/                    # Voice profiles
│   ├── VOICES.md              # Documentation for voice profiles
│   ├── default.json           # Default voice profile
│   ├── british.json           # British voice profile
│   └── technical.json         # Technical voice profile
├── ai_docs/                   # AI documentation
├── commands/                  # Command implementations
├── images/                    # Project images
├── layers/                    # Architectural layers
│   ├── __init__.py
│   ├── speech_input_layer.py  # Speech recognition layer
│   ├── output_layer.py        # Output layer (including TTS)
│   └── ...                    # Other layers
├── modules/                   # Core modules
│   ├── __init__.py
│   ├── assistant_config.py    # Configuration handling
│   ├── base_assistant.py      # Base assistant implementation
│   ├── data_types.py          # Data type definitions
│   ├── execute_python.py      # Python execution utilities
│   ├── ollama.py              # Ollama integration
│   ├── query_helper.py        # LLM query helper
│   ├── typer_agent.py         # Typer CLI agent
│   └── utils.py               # Utility functions
├── prompts/                   # Prompt templates
└── tests/                     # Test suite
    ├── __init__.py
    ├── .env.test              # Test environment variables
    ├── test_helper.py         # Test utilities
    ├── test_speech_input_layer.py # Speech input tests
    └── ...                    # Various test modules

Extending the Assistant

Adding New Voice Capabilities

The assistant uses a modular architecture that makes it easy to add new voice capabilities:

For new TTS engines, extend the TextToSpeechOutputLayer class
For new STT engines, extend the SpeechInputLayer class

Custom Commands

Create new command modules in the commands/ directory following the template pattern.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

hackur/always-on-ai-assistant