Human-Centric Image Captioning

This is an official Pytorch implementation for Human-Centric Image Captioning task.

This repo is modified from the well-known codebase by Ruotian Luo.

HC-COCO

HC-COCO, based on MSCOCO, is specially designed for Human-Centric Image Captioning task. It contains 16,125 images and 78,462 sentences, with more than 70% of the captions focusing on human actions and more than 49% focusing on human-object interactions. Furthermore, ten body part bounding boxes for each person are annotated. The dataset can be downloaded here

Requirements

Python 3.6
Java 1.8.0
PyTorch 1.0

The submodules (cider and coco-caption) could be downloaded here

Prepare Data

Please refer to here and place the file into ./coco-caption/annotations/

Download Pre-processed Features

Please download the Updown features and VC features for body part regions.
Please download the VC features
Please download the Updown features
Please download the part masks

Pretrained model

The pre-trained model can be download here

Start training

$ python train_hc.py --id HCCM --caption_model HCCM --input_json data/cocohc.json --input_label_h5 data/cocohc_label.h5 --input_att_dir_vc [the/path/to/VC_Feature/trainval] --input_att_dir [the/path/to/Updown_Feature] --body_part_dir [the/path/to/body_part_Updown_Feature] --body_part_vc_dir [the/path/to/body_part_VC_Feature] --part_mask_dir [the/path/to/part_mask_dir] --batch_size 10 --learning_rate 2e-4 --checkpoint_path log_hc --save_checkpoint_every 4000 --val_images_use 2500 --max_epochs 80 --rnn_size 2048 --input_encoding_size 1024 --self_critical_after 30 --language_eval 1 --learning_rate_decay_start 0 --scheduled_sampling_start 0 --use_vc

NOTE: This command mix the cross-entropy and self-critical training. If you want to training them separately, you may need:

Cross Entropy Training

$ python train_hc.py --id HCCM --caption_model HCCM --input_json data/cocohc.json --input_label_h5 data/cocohc_label.h5 --input_att_dir_vc [the/path/to/VC_Feature/trainval] --input_att_dir [the/path/to/Updown_Feature] --body_part_dir [the/path/to/body_part_Updown_Feature] --body_part_vc_dir [the/path/to/body_part_VC_Feature] --part_mask_dir [the/path/to/part_mask_dir] --batch_size 10 --learning_rate 2e-4 --checkpoint_path log_hc --save_checkpoint_every 4000 --val_images_use 2500 --rnn_size 2048 --input_encoding_size 1024 --max_epochs 30 --language_eval 1

Self-critical Training

$ python train_hc.py --id HCCM --caption_model HCCM --caption_model HCCM --input_json data/cocohc.json --input_label_h5 data/cocohc_label.h5 --input_att_dir_vc [the/path/to/VC_Feature/trainval] --input_att_dir [the/path/to/Updown_Feature] --body_part_dir [the/path/to/body_part_Updown_Feature] --body_part_vc_dir [the/path/to/body_part_VC_Feature] --part_mask_dir [the/path/to/part_mask_dir] --batch_size 10 --learning_rate 2e-5 --start_from log_hc --checkpoint_path log_hc --save_checkpoint_every 4000 --language_eval 1 --val_images_use 2500 --self_critical_after 30 --rnn_size 2048 --input_encoding_size 1024 --cached_tokens coco-train-idxs --max_epoch 80

Evaluation

python eval_hc.py --model log_hc/model-best.pth --infos_path log_hc/infos_HCCM-best.pkl  --dump_images 0 --num_images -1 --language_eval 1 --beam_size 5 --batch_size 50 --split test

JohnDreamer/HCCM