The notebooks/
folder contains the following Jupyter Notebooks:
-
ev_metrics_summary.ipynb: includes an explanation of the four evaluation metrics that we have used (BLEU, METEOR, CIDEr and ROUGE-L) with references to the corresponding papers as well as examples of how to compute these metrics using Python libraries or the
evaluation
model. -
instagram_captions.ipynb: a few statistics regarding the captions of the
instagram
dataset. -
visualize_results.ipynb: shows how to use the code in caption.py to visualize the attention process of the model when predicting its caption.
-
word_embeddings.ipynb: little demostration of how to load and use the word embeddings models of
word2vec
andemoji2vec
.
The preprocessing/
folder contains three Python scripts that allow to preprocess the
flickr8k , flickr30k and the instagram datasets, assuming that they are stored in the folders data/datasets/flickr8k
, data/datasets/flickr30k
and data/datasets/instagram
, respectively, as follows:
python preprocessing/flickr.py -d flickr8k # preprocess 'flickr8k' dataset
python preprocessing/flickr.py -d flickr30k # preprocess 'flickr30k' dataset
python preprocessing/instagram.py # preprocess 'instagram' dataset
python preprocessing/flickr-insta.py #preprocess and combine the 'flickr8k' (or 'flickr30k') and the 'instagram dataset
The parameters that can be specified are:
-min
or--minimal-length
: minimum length of the captions. The default is2
.-max
or--maximal-length
: maximum length of the captions. The default is50
.-wf
or--min-word-frequency
: minimum frequency of a word to be included in the word map / vocabulary.-c
or--captions-per-image
: number of captions per image. The default is5
. However, be aware that theinstagram
dataset contains only one caption per image.
Besides, when running the preprocessing/instagram.py
script, the following additional parameters can be specified:
-t
or--train-size
: proportion of the dataset that is used for training. The default is0.60
(60%).-v
or--val-size
: proportion of the dataset that is used for training. The default is0.20
(20%). The size of the test split will be computed as1 - (train-size + val-size)
.
and when running the preprocessing/flickr.py
script, the additional parameter can be specified:
-d
or--dataset
with possible values'flickr8k'
or'flickr30k'
. Default is'flickr8k'
.
The output of this process includes the following files:
- word map (
WORDMAP_datasetname.json
): a .json file containing a mapping word - index. - preprocessed images (
SPLIT_IMAGES_datasetname.hdf5
) for theTRAIN
,VAL
andTEST
splits. - encoded captions (
SPLIT_CAPLENS_datasetname.json
) for theTRAIN
,VAL
andTEST
splits. These files contain the encoded captions with a fix length equal to the--maximal-length
argument. The captions are encoded using the word map inWORDMAP_dataset.json
.
The train.py script allows to train a model from scratch or continue training a model providing a certain checkpoint. Due to the high number of parameters, these can be directly modified in the code. Check the code to see which parameters can be specified.
Once the parameters have been fixed, you can execute the following command to train the model:
python train.py
Note: the code assumes that a file EMBEDDINGS_dataset.pt
containing the embeddings exists and loads it. However, is it possible to train its own embeddings from scratch.
The evaluate.py scripts to evaluate a model providing the correspoding checkpoint and making use of the evaluation
module. By default, all the metrics are calculated: BLEU (1, 2, 3 and 4), METEOR, CIDEr and ROUGE-L. Check the code to see which parameters can be specified
Once the parameters have been fixed, you can execute the following command to train the model:
python evaluate.py
This repository includes code from the following repositories:
- Microsoft COCO Caption Evaluation, from Tsung-Yi Lin: https://github.com/tylin/coco-caption
- image-caption-metrics repository, from EricWWWW: https://github.com/EricWWWW/image-caption-metrics
- A PyTorch tutorial for Image Captioning, from Sagar Vinodababu: https://github.com/sgrvinod/a-PyTorch-Tutorial-to-Image-Captioning