This project is a real-time voice chat interface using OpenAI's GPT-4o model. The system accepts spoken input from users, processes it through the OpenAI's real-time API, and returns an audio response that the user can listen to instantly.
- Real-time interaction with OpenAI's GPT-4o Model: Converse with the AI using your voice.
- Audio Processing: The application processes and decodes audio data for smooth interactions.
- Gradio Interface: Provides an easy-to-use web interface for voice recording and playback.
- Asynchronous Websocket Communication: Utilizes Python's
asyncio
andwebsockets
for efficient real-time data transfer.
Ensure you have the following installed:
- Python 3.7 or above
- Virtual Environment (recommended)
- API key from OpenAI with access to the realtime API
- Packages listed in the
requirements.txt
(see below for details)
-
Clone the repository:
git clone https://github.com/nucleargeeketh/openai-realtime-python-example.git cd openai-realtime-python-example
-
Create and activate a virtual environment:
- On macOS and Linux:
python3 -m venv venv source venv/bin/activate
- On Windows:
python -m venv venv .\venv\Scripts\activate
- On macOS and Linux:
-
Install the dependencies:
pip install -r requirements.txt
-
Set up the environment variables:
Create a
.env
file in the project's root directory and add your OpenAI API key:OPENAI_API_KEY=your-openai-api-key
-
Run the application:
python main.py
-
Access the Gradio Interface:
Open your browser and navigate to the provided localhost URL (e.g.,
http://127.0.0.1:7860/
). -
Interact with the Model:
- Go to the "VoiceChat" tab.
- Use the microphone to record your questions or inputs.
- Receive a real-time audio response generated by the model.
-
Audio Input: Capture user voice input through Gradio's audio interface set to numpy arrays.
-
WebSocket Connection: Establish a secure WebSocket connection to OpenAI's realtime API using the provided API key.
-
Data Serialization: Convert the audio data to base64 and package it in a JSON format for sending over the WebSocket.
-
Response Handling: Receive streamed audio data from the OpenAI server, decode it, and prepare it for playback.
-
Output: Play the AI-generated audio response back to the user.
main.py
: Main application script to run the Gradio interface.requirements.txt
: Lists the necessary Python libraries to be installed..env
: Stores environment variables including sensitive API keys.
This project depends on several key libraries:
websockets
: For maintaining WebSocket connections.pydub
andsoundfile
: For audio processing.gradio
: For building and managing the web interface.python-dotenv
: For loading environment variables from a.env
file.
Contributions are welcome! Please submit a pull request or open an issue to discuss any changes.
For any issues or feature requests, please open an issue in this GitHub repository. Happy chatting with AI in real-time!