Iemontine/AudioVideoDescriptiveAI

Dynamically describes video content by embedding temporal and audio context into visual data using an audio classification neural network (PANNs), utilizing OpenAI's GPT-4o model.

Jupyter Notebook

This repository is not active