Code used for running experiments for my PhD thesis (link to thesis will be included later). Part of the code was also used for the paper "Transfer learning from language models to image caption generators: Better models may not transfer better".
This thesis is an analysis of the different image caption generator neural network architectures available.
Works on Python 3.
Python dependencies (install all with pip
):
tensorflow==1.4
numpy
scipy
h5py
skopt
nltk
PIL
- Download Karpathy's Flickr8K, Flickr30K, and MSCOCO captions and put them in
mtanti-phd/datasets/capgen/DATASET/karpathy/dataset.json
where DATASET is flickr8k, flickr30k, or mscoco (rename the files todataset.json
!). - Download the Flick8K images and put them in
mtanti-phd/datasets/capgen/flickr8k/images
. - Download the Flick30K images and put them in
mtanti-phd/datasets/capgen/flickr30k/images
. - Download the MSCOCO 2014 images and put them all together in
mtanti-phd/datasets/capgen/mscoco/images
. - Download LM1B Google News corpus and extract it in
mtanti-phd/datasets/text/lm1b/1-billion-word-language-modeling-benchmark-master
. - Download the MSCOCO Evaluation toolkit extract it in
mtanti-phd/tools/coco-caption-master
. - Open
mtanti-phd/experiments/thesis/framework/config/machine_specific.py
and setbase_dir
to the directory of mtanti-php andval_batch_size
to the maximum batch size that can be processed by your GPU (start with a low number like 100 and keep increasing until you get an out of memory error). - Open
mtanti-phd/experiments/thesis/framework/config/general.py
and setdebug
to True or False (True is used to run a quick test). - Run
mtanti-phd/experiments/thesis/dataset_maker.py
to pre-process all the data and store it inmtanti-phd/experiments/thesis/data
. - Remove all files inside
mtanti-phd/experiments/thesis/hyperparams
andmtanti-phd/experiments/thesis/results
as results are not re-computed if already saved.
All the instructions to run the experiments can be found inside mtanti-php/experiments/thesis
.