Gemini Video Description is a Python project that uses the Google Generative AI and the Gemini models to generate a description of a YouTube video based on the key frames extracted from the video. The project is in early stage and the code is inspired from this Kaggle notebook.
Using Linux
export GEMINI_API=your_api_key
Using Powershell
$env:GEMINI_API = "your_api_key"
To use the project, you need to have a valid API key for the Google Generative AI. You can get one from here.
You also need to install the required dependencies from the requirements.txt
file. You can do this by running the following command:
pip install -r requirements.txt
Then, you can run the main script main.py
with the URL of the YouTube video as a command line argument. For example:
python main.py "https://www.youtube.com/watch?v=example"
You can also specify the directory to save the selected frames and the file name to save the video as optional arguments. For example:
python main.py https://www.youtube.com/watch?v=example --selected_frame_dir my_frames --video_file my_video.mp4
If you omit these arguments, the script will use the default values selected_frames
and temp_video.mp4
.
The script will download the video, extract the key frames, generate text for each frame, and then generate a final description of the video using the Gemini models. The script will also plot and show the selected frames and save the figure to the output directory.
The project requires the following Python modules:
- google-generativeai==0.3.2
- matplotlib
- mpld3
- numpy
- opencv-python==4.9.0.80
- Pillow
- scikit-image==0.22.0
- tqdm==4.66.1
You can install them using the requirements.txt
file as mentioned above.
The project is still under development and has the following TODO items:
- Optimize Frame Description Prompt
- Extend Prompt Functionality for Universal Video Descriptions
- Implement Chat History Archival
- Integrate Configuration File Reader
- Implement Enhanced Logging with Console Output