Introduction

The code is developed to provide real-time description of live video frames using OpenCV, OpenAI's GPT-4 Vision and TTS model. It captures video frames from the default camera, generates textual descriptions for the frames, and displays the live video feed. The descriptions are generated by OpenAI's GPT-4 Vision model and involve contextual analysis for consecutive frames. The TTS model then reads it out loud.

Prerequisites

Before running the script, ensure you have the following installed:

Python 3.x
OpenAI Python library (openai)
OpenAI TTS-1-HD model
Additional Python libraries: base64, requests, opencv-python, sounddevice, soundfile, io, threading, dotenv

Setup

Clone this repository to your local machine.
Install the required Python libraries
Set up your OpenAI API key by creating a .env file in the project directory with the following content:

Usage

Run the script by executing the following command in your terminal or command prompt:

py GPT_4_Vision_Live_video_description.py

Adjust the parameters in the live_video_description function to customize the behavior of the script.

Press q to stop the script running.

yangningbo/live-video-description

Introduction

Prerequisites

Setup

Usage