Zero-shot Image Captioning with VisualBERT and Show and Tell

This is a project that demonstrates zero-shot image captioning using two popular models in computer vision and natural language processing: VisualBERT and Show and Tell.

Requirements

Python
PyTorch
transformers
torchvision
tensorflow

The repository contains two jupyter notebookks, one for VisualBERT and the other for Show and Tell

VisualBERT Notebook

Load the VisualBERT Model
Load the COCO Dataset
Extract Image Features Using ResNET18
Fine Tune VisualBERT (This is where the error occur)

Show and Tell Notebook

Load the COCO Dataset
Encode the captions
Preprocess the Images
Build the Model
Train the Model (This is where the errors occur)

OmarTahoun/Zero-shot-Image-Captioning

Zero-shot Image Captioning with VisualBERT and Show and Tell

VisualBERT Notebook

Show and Tell Notebook