Image-Caption

Project for IDS 705: Principles of Machine Learning

Image-Caption

References

Reference for network framework and training code: a-PyTorch-Tutorial-to-Image-Captioning.
Reference Paper Show, Attend and Tell
Data comes from: Flickr 8k Dataset

Before training

clone the current repository to servers

$ git clone git@github.com:wkhalil/Image-Caption.git

before training & testing: make sure all libraries are installed with compatible versions (requirements.txt for reference).

using the interpreter with compatible environment to open the project.
if using MacOS Catalina, please run the following commands to avoid potential bug. https://stackoverflow.com/questions/48290403/process-finished-with-exit-code-134-interrupted-by-signal-6-sigabrt

check for directory tree (missing folders can be created manually, most with place_holder.txt)
download Flickr 8k Dataset and store the folder with images in project as './inputs/Images'.
directory necessary before training:

.
├── README.md
├── caption.py
├── config.py
├── create_input_files.py
├── datasets.py
├── eval.py
├── inputs
│   ├── Images
│   ├── Intermediate_files
│   │   └── place_holder.txt
│   └── captions.txt
├── logs
│   └── log
│       ├── train
│       │   └── place_holder.txt
│       └── val
│           └── place_holder.txt
├── model_history
│   └── place_holder.txt
├── models.py
├── output
│   └── place_holder.txt
├── requirements.txt
├── test_results
│   └── place_holder.txt
├── train.py
└── utils.py

Preprocess

$ python create_input_files.py

(only once for generating intermediate data files)

Training Process

There are two choices for training, if training without previous checkpoints, set checkpoints=None in config.py.
Another one is training based on current best model (default, obtain best model: latest model checkpoints).
run the train.py

$ python train.py

Evaluation metrics

first install pycocoevalcap for CIDER, SPICE metrics.(Problems still exist for SPICE after installing following instructions, and others also met the same problem)

$ python eval.py

Testing Process

All generated captions would be stored under test_results directory.
Three choices:

If randomly generated one caption from all inputs

$ python caption.py

If randomly generated multiple captions from all inputs
ex. randomly select 6 images to generate captions

$ python caption.py --num = 6

generate caption for specified image & specified model & specified path to save
ex.

$ python caption.py  --word_map='./inputs/Intermediate_files/WORDMAP_flickr8k_5_cap_per_img_5_min_word_freq.json' --beam_size=5 --img='./inputs/Images/2877424957_9beb1dc49a.jpg' --save='./test_results/gen_3' --model='./model_history/best_0417_20.pth.tar'

yuwei-z/Image-Caption