The code is developed to provide real-time description of live video frames using OpenCV, OpenAI's GPT-4 Vision and TTS model. It captures video frames from the default camera, generates textual descriptions for the frames, and displays the live video feed. The descriptions are generated by OpenAI's GPT-4 Vision model and involve contextual analysis for consecutive frames. The TTS model then reads it out loud.
Before running the script, ensure you have the following installed:
- Python 3.x
- OpenAI Python library (openai)
- OpenAI TTS-1-HD model
- Additional Python libraries: base64, requests, opencv-python, sounddevice, soundfile, io, threading, dotenv
-
Clone this repository to your local machine.
-
Install the required Python libraries
-
Set up your OpenAI API key by creating a .env file in the project directory with the following content:
Run the script by executing the following command in your terminal or command prompt:
py GPT_4_Vision_Live_video_description.py
Adjust the parameters in the live_video_description function to customize the behavior of the script.
Press q to stop the script running.