AI Voice Assistant with Twilio and Google Gemini (Python)

This project creates a AI voice assistant that uses Twilio Voice and ConversationRelay, and the Google Gemini API to engage in two-way conversations over a phone call.

Overview

This application allows users to call a Twilio number and interact with an AI assistant powered by Google's gemini-2.5-flash model. The assistant will respond to user queries in natural, spoken language.

Prerequisites

Python 3.10+
A Twilio Account: Sign up for a free trial here.
A Twilio Number with Voice Capabilities: Instructions to purchase a number.
A Google AI API Key: Visit Google AI Studio here to generate a key for free.

Installation

Clone this repository:

git clone https://github.com/rishabkumar7/twilio-cr-gemini-python
cd twilio-cr-gemini-python

Install the required dependencies. It's recommended to use a virtual environment.
```
pip install -r requirements.txt
```
Configure your environment variables by creating a .env file in the root of your project:
- You can copy the example: cp .env.example .env (if you have one) or create it manually.
- Add your keys to the .env file:
```
# .env file
GOOGLE_API_KEY="YOUR_GOOGLE_AI_API_KEY_HERE"
NGROK_URL="your-ngrok-forwarding-url.ngrok-free.app"
```

Usage

Start ngrok to expose your local server to the internet on port 8080:
```
ngrok http 8080
```
Copy the https:// forwarding URL from your ngrok terminal and update the NGROK_URL in your .env file with the domain part (e.g., your-ngrok-forwarding-url.ngrok-free.app).
Run the application:
```
python main.py
```
Configure your Twilio phone number's voice webhook. In the Twilio console, navigate to your number's settings and under "A CALL COMES IN", set the webhook to your ngrok URL with the /twiml endpoint (e.g., https://your-ngrok-forwarding-url.ngrok-free.app/twiml).
Call your Twilio number and start talking to your new Gemini-powered voice assistant!

How It Works

When a user calls the Twilio number, Twilio makes an HTTP request to the /twiml endpoint.
The application returns TwiML, which instructs Twilio to establish a WebSocket connection to the server at /ws.
Voice input from the user is transcribed by Twilio and sent to the server as JSON messages over the WebSocket.
The server sends the transcribed text to the Google Gemini API and gets a response.
The AI-generated text response is sent back to Twilio through the WebSocket.
Twilio's built-in Text-to-Speech (TTS) engine converts the text to audio and plays it for the user.
The conversation continues until the call is disconnected.

Project Structure

main.py: The main application file containing the FastAPI server, WebSocket handler, and Google Gemini integration.
requirements.txt: A file listing the Python dependencies.
.env: A file for storing environment variables like your GOOGLE_API_KEY and NGROK_URL.