TFE: A Python repository from Robin-Jacobs

This is the model of Image Captioning and the database manipulation code implemented for the Master Thesis "Improving Image Captioning with Dense Annotation". The model in based on the existing model: https://github.com/sgrvinod/a-PyTorch-Tutorial-to-Image-Captioning

The required files are to run this model are: From the MS COCO library:

captions_train2017.json
captions_val2017.json

From the Visual Genome library:

objects.json
image_data.json

These files can be found on their website: http://cocodataset.org/#download https://visualgenome.org/api/v0/api_home.html

Since it uses its own database, DatabaseManipulation.py must be run at first. Then, createInputFiles.py will prepare the data for the training. After these 2 codes, you can run train.py. Some checkpoints are saved during the training so it can be resume from them.

When the training is completed, you can caption an image with the following command:

python caption.py --img='path/to/image.jpeg' --model='path/to/BEST_checkpoint_coco_5_cap_per_img_5_min_word_freq.pth.tar' --word_map='path/to/WORDMAP_coco_5_cap_per_img_5_min_word_freq.json' --beam_size=5

Robin-Jacobs/TFE