/TFE

This is the work accomplished on the databases for the Master Thesis "Improving Image Captioning with Dense Annotation"

Primary LanguagePython

This is the model of Image Captioning and the database manipulation code implemented for the Master Thesis "Improving Image Captioning with Dense Annotation". The model in based on the existing model: https://github.com/sgrvinod/a-PyTorch-Tutorial-to-Image-Captioning

The required files are to run this model are: From the MS COCO library:

  • captions_train2017.json
  • captions_val2017.json

From the Visual Genome library:

  • objects.json
  • image_data.json

These files can be found on their website: http://cocodataset.org/#download https://visualgenome.org/api/v0/api_home.html

Since it uses its own database, DatabaseManipulation.py must be run at first. Then, createInputFiles.py will prepare the data for the training. After these 2 codes, you can run train.py. Some checkpoints are saved during the training so it can be resume from them.

When the training is completed, you can caption an image with the following command:

python caption.py --img='path/to/image.jpeg' --model='path/to/BEST_checkpoint_coco_5_cap_per_img_5_min_word_freq.pth.tar' --word_map='path/to/WORDMAP_coco_5_cap_per_img_5_min_word_freq.json' --beam_size=5