This repository contains code for a voice-powered AI assistant that can perform various tasks, including speech recognition, natural language generation, and more. Below, I'll provide an overview of the components and instructions for setting up and using this assistant.
Python 3.11
-
Whisper Model (Speech Recognition)
- The
faster_whisper
library provides a lightweight speech recognition model. - It listens for a wake word (e.g., "chris") and captures audio.
- Adjust the
whisper_size
and other parameters as needed.
- The
-
OpenAI API (Natural Language Generation)
- The
openai
library allows interaction with OpenAI's powerful language models. - Set your OpenAI API key in the
OPENAI_KEY
variable.
- The
-
Google API (Configuration)
- The
genai
library configures the Google API for additional functionality. - Replace
GOOGLE_API_KEY
with your own API key. (https://ai.google.dev/)
- The
-
Conversation with Gemini Model
- The
gemini-1.0-pro-latest
model from GenAI powers the conversation. - Safety settings are configured to block harmful content.
- The model generates responses based on input.
- The
-
Wake Word Detection
- The assistant listens for the wake word ("chris").
- When detected, it starts capturing audio.
-
Speech Recognition
- The Whisper model processes the captured audio.
- Adjust the wake word and other parameters as needed.
-
Natural Language Generation
- The OpenAI API generates responses based on user input.
- The Gemini model provides conversational capabilities.
-
System Messages
- The assistant responds to system messages (e.g., "AFFIRMATIVE").
- Follow the instructions provided by the system.
- Clone this repository to your local machine.
- Install the required dependencies (Whisper, OpenAI, genai, etc.).
- Set your API keys in the appropriate variables.
- Run the main script to start the assistant.
Feel free to contribute to this project by adding new features, improving existing code, or enhancing the conversation model. Happy coding! 🚀