mtanti's PhD

Code used for running experiments for my PhD thesis (link to thesis will be included later). Part of the code was also used for the paper "Transfer learning from language models to image caption generators: Better models may not transfer better".

This thesis is an analysis of the different image caption generator neural network architectures available.

Works on Python 3.

Dependencies

Python dependencies (install all with pip):

Download Karpathy's Flickr8K, Flickr30K, and MSCOCO captions and put them in mtanti-phd/datasets/capgen/DATASET/karpathy/dataset.json where DATASET is flickr8k, flickr30k, or mscoco (rename the files to dataset.json!).
Download the Flick8K images and put them in mtanti-phd/datasets/capgen/flickr8k/images.
Download the Flick30K images and put them in mtanti-phd/datasets/capgen/flickr30k/images.
Download the MSCOCO 2014 images and put them all together in mtanti-phd/datasets/capgen/mscoco/images.
Download LM1B Google News corpus and extract it in mtanti-phd/datasets/text/lm1b/1-billion-word-language-modeling-benchmark-master.
Download the MSCOCO Evaluation toolkit extract it in mtanti-phd/tools/coco-caption-master.
Open mtanti-phd/experiments/thesis/framework/config/machine_specific.py and set base_dir to the directory of mtanti-php and val_batch_size to the maximum batch size that can be processed by your GPU (start with a low number like 100 and keep increasing until you get an out of memory error).
Open mtanti-phd/experiments/thesis/framework/config/general.py and set debug to True or False (True is used to run a quick test).
Run mtanti-phd/experiments/thesis/dataset_maker.py to pre-process all the data and store it in mtanti-phd/experiments/thesis/data.
Remove all files inside mtanti-phd/experiments/thesis/hyperparams and mtanti-phd/experiments/thesis/results as results are not re-computed if already saved.

All the instructions to run the experiments can be found inside mtanti-php/experiments/thesis.