/DL-Visual-Question-Answering

The Visual Question Answering (VQA) project features a model with a simple GUI that handles both images and videos. It uses OpenAI's CLIP for encoding images and questions and GPT-2 for decoding embeddings to answer questions based on the VQA Version 2 dataset, which includes 265,016 images with multiple questions and answers.

Primary LanguageJupyter Notebook

Watchers