This project is an advanced AI voice assistant powered by Google's Gemini language model. It uses voice recognition to detect a wake word, processes user queries, and responds using text-to-speech technology.
- Wake word detection ("Jarvis")
- Voice input processing
- Natural language understanding and generation using Gemini AI
- Text-to-speech output
- Conversation history management
- Integration with various tools:
- Setting timers and alarms
- Calendar management (Google Calendar API)
- Weather information retrieval (OpenWeatherMap API)
- News fetching (Google News API)
- Smart home device control (Home Assistant API)
- To-do list and reminder management
- Web search functionality (Google Search API)
- Message and email sending
- Music and podcast playback
- Directions and traffic information (Google Maps API)
- Flight status checking
- Phone messaging via Discord API
- Python 3.8+
- pip
- virtualenv
-
Clone the repository:
git clone https://github.com/xenioxyt/jarvis-gemini.git cd jarvis-gemini
-
Create and activate a virtual environment:
python -m venv venv source venv/bin/activate # On Windows, use `venv\Scripts\activate`
-
Install the required packages:
pip install -r requirements.txt
-
Set up environment variables: Create a
.env
file in the project root and add the following:PICOVOICE_ACCESS_KEY=your_picovoice_access_key GEMINI_API_KEY=your_gemini_api_key GOOGLE_CREDENTIALS=path/to/your/google-credentials.json GOOGLE_API_KEY=your_google_api_key GOOGLE_CSE_ID=your_google_cse_id OPENWEATHER_API_KEY=your_openweather_api_key GOOGLE_CLOUD_API_KEY=your_google_cloud_api_key NEWS_API_KEY=your_news_api_key DISCORD_BOT_TOKEN=your_discord_bot_token
-
Ensure you have the necessary credentials file for Google Cloud services.
To start the voice assistant, run:
python main.py
The assistant will listen for the wake word "Jarvis". Once detected, it will record your query, process it using the Gemini AI, and respond using text-to-speech.
main.py
: The entry point of the application.wake_word_detector.py
: Handles wake word detection.audio_recorder.py
: Manages audio recording after wake word detection.gemini_api.py
: Interfaces with the Gemini AI for natural language processing.tts_engine.py
: Handles text-to-speech conversion and audio playback.tools.py
: Contains various tool functions for extended functionality.prompt.py
: Defines the system prompt for the AI assistant.function_tools/
: Directory containing individual tool implementations:weather.py
: Retrieves weather information.google_search.py
: Performs web searches.news.py
: Fetches news articles.directions.py
: Provides directions and traffic information.discord_message.py
: Sends messages via Discord.phone_message.py
: Sends messages to phones.place_info.py
: Retrieves information about places.take_notes.py
: Manages note-taking functionality.
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License.