This project creates a AI voice assistant that uses Twilio Voice and ConversationRelay, and the Google Gemini API to engage in two-way conversations over a phone call.
This application allows users to call a Twilio number and interact with an AI assistant powered by Google's gemini-2.5-flash model. The assistant will respond to user queries in natural, spoken language.
- Python 3.10+
- A Twilio Account: Sign up for a free trial here.
- A Twilio Number with Voice Capabilities: Instructions to purchase a number.
- A Google AI API Key: Visit Google AI Studio here to generate a key for free.
-
Clone this repository:
git clone https://github.com/rishabkumar7/twilio-cr-gemini-python cd twilio-cr-gemini-python -
Install the required dependencies. It's recommended to use a virtual environment.
pip install -r requirements.txt
-
Configure your environment variables by creating a .env file in the root of your project:
-
You can copy the example: cp .env.example .env (if you have one) or create it manually.
-
Add your keys to the .env file:
# .env file GOOGLE_API_KEY="YOUR_GOOGLE_AI_API_KEY_HERE" NGROK_URL="your-ngrok-forwarding-url.ngrok-free.app"
-
-
Start ngrok to expose your local server to the internet on port 8080:
ngrok http 8080
-
Copy the
https://forwarding URL from your ngrok terminal and update the NGROK_URL in your.envfile with the domain part (e.g., your-ngrok-forwarding-url.ngrok-free.app). -
Run the application:
python main.py
-
Configure your Twilio phone number's voice webhook. In the Twilio console, navigate to your number's settings and under "A CALL COMES IN", set the webhook to your ngrok URL with the
/twimlendpoint (e.g., https://your-ngrok-forwarding-url.ngrok-free.app/twiml). -
Call your Twilio number and start talking to your new Gemini-powered voice assistant!
- When a user calls the Twilio number, Twilio makes an HTTP request to the /twiml endpoint.
- The application returns TwiML, which instructs Twilio to establish a WebSocket connection to the server at /ws.
- Voice input from the user is transcribed by Twilio and sent to the server as JSON messages over the WebSocket.
- The server sends the transcribed text to the Google Gemini API and gets a response.
- The AI-generated text response is sent back to Twilio through the WebSocket.
- Twilio's built-in Text-to-Speech (TTS) engine converts the text to audio and plays it for the user.
- The conversation continues until the call is disconnected.
-
main.py: The main application file containing the FastAPI server, WebSocket handler, and Google Gemini integration. -
requirements.txt: A file listing the Python dependencies. -
.env: A file for storing environment variables like yourGOOGLE_API_KEYandNGROK_URL.