Project for IDS 705: Principles of Machine Learning
Reference for network framework and training code: a-PyTorch-Tutorial-to-Image-Captioning.
Reference Paper Show, Attend and Tell
Data comes from: Flickr 8k Dataset
- clone the current repository to servers
$ git clone git@github.com:wkhalil/Image-Caption.git
- before training & testing: make sure all libraries are installed with compatible versions (requirements.txt for reference).
- using the interpreter with compatible environment to open the project.
- if using MacOS Catalina, please run the following commands to avoid potential bug. https://stackoverflow.com/questions/48290403/process-finished-with-exit-code-134-interrupted-by-signal-6-sigabrt
- check for directory tree (missing folders can be created manually, most with place_holder.txt)
download Flickr 8k Dataset and store the folder with images in project as './inputs/Images'.
directory necessary before training:
.
├── README.md
├── caption.py
├── config.py
├── create_input_files.py
├── datasets.py
├── eval.py
├── inputs
│ ├── Images
│ ├── Intermediate_files
│ │ └── place_holder.txt
│ └── captions.txt
├── logs
│ └── log
│ ├── train
│ │ └── place_holder.txt
│ └── val
│ └── place_holder.txt
├── model_history
│ └── place_holder.txt
├── models.py
├── output
│ └── place_holder.txt
├── requirements.txt
├── test_results
│ └── place_holder.txt
├── train.py
└── utils.py
$ python create_input_files.py
- There are two choices for training, if training without previous checkpoints, set checkpoints=None in config.py.
Another one is training based on current best model (default, obtain best model: latest model checkpoints). - run the train.py
$ python train.py
first install pycocoevalcap for CIDER, SPICE metrics.(Problems still exist for SPICE after installing following instructions, and others also met the same problem)
$ python eval.py
All generated captions would be stored under test_results directory.
Three choices:
- If randomly generated one caption from all inputs
$ python caption.py
- If randomly generated multiple captions from all inputs
ex. randomly select 6 images to generate captions
$ python caption.py --num = 6
- generate caption for specified image & specified model & specified path to save
ex.
$ python caption.py --word_map='./inputs/Intermediate_files/WORDMAP_flickr8k_5_cap_per_img_5_min_word_freq.json' --beam_size=5 --img='./inputs/Images/2877424957_9beb1dc49a.jpg' --save='./test_results/gen_3' --model='./model_history/best_0417_20.pth.tar'