/Gemini-Voice-Assistant

Gemini Voice Assistant

Primary LanguagePython

README: Gemini-Voice-Assistant

This repository contains code for a voice-powered AI assistant that can perform various tasks, including speech recognition, natural language generation, and more. Below, I'll provide an overview of the components and instructions for setting up and using this assistant.

Python 3.11

Components

  1. Whisper Model (Speech Recognition)

    • The faster_whisper library provides a lightweight speech recognition model.
    • It listens for a wake word (e.g., "chris") and captures audio.
    • Adjust the whisper_size and other parameters as needed.
  2. OpenAI API (Natural Language Generation)

    • The openai library allows interaction with OpenAI's powerful language models.
    • Set your OpenAI API key in the OPENAI_KEY variable.
  3. Google API (Configuration)

    • The genai library configures the Google API for additional functionality.
    • Replace GOOGLE_API_KEY with your own API key. (https://ai.google.dev/)
  4. Conversation with Gemini Model

    • The gemini-1.0-pro-latest model from GenAI powers the conversation.
    • Safety settings are configured to block harmful content.
    • The model generates responses based on input.

Usage

  1. Wake Word Detection

    • The assistant listens for the wake word ("chris").
    • When detected, it starts capturing audio.
  2. Speech Recognition

    • The Whisper model processes the captured audio.
    • Adjust the wake word and other parameters as needed.
  3. Natural Language Generation

    • The OpenAI API generates responses based on user input.
    • The Gemini model provides conversational capabilities.
  4. System Messages

    • The assistant responds to system messages (e.g., "AFFIRMATIVE").
    • Follow the instructions provided by the system.

Getting Started

  1. Clone this repository to your local machine.
  2. Install the required dependencies (Whisper, OpenAI, genai, etc.).
  3. Set your API keys in the appropriate variables.
  4. Run the main script to start the assistant.

Contributions

Feel free to contribute to this project by adding new features, improving existing code, or enhancing the conversation model. Happy coding! 🚀