This repository is not being actively maintained due to lack of time and interest. My sincerest apologies to the open source community for allowing this project to stagnate. I hope it was useful for some one of you as a jumping-off point.
Tensorflow implement of paper: A Hierarchical Approach for Generating Descriptive Image Paragraphs
We donot fine-tunning the parameters, but this model can get the following scores:
Download the VisualGenome dataset, we get the two files: VG_100K, VG_100K_2. According to the paper, we download the training, val and test splits json files. These three json files save the image names of train, validation, test data.
Running the script:
$ python split_dataset
We will get images from [VisualGenome dataset] which the authors used in the paper.
##Step 2 Run the scripts:
$ python get_imgs_train_path.py
$ python get_imgs_val_path.py
$ python get_imgs_test_path.py
We will get three txt files: imgs_train_path.txt, imgs_val_path.txt, imgs_test_path.txt. They save the train, val, test images path.
After this, we use dense caption
to extract features. Deploy the running environment follow by densecap step by step.
Run the script:
$ ./download_pretrained_model.sh
$ th extract_features.lua -boxes_per_image 50 -max_images -1 -input_txt imgs_train_path.txt \
-output_h5 ./data/im2p_train_output.h5 -gpu 0 -use_cudnn 1
We should download the pre-trained model: densecap-pretrained-vgg16.t7
. Then, according to the paper, we extract 50 boxes from each image.
Also, don't forget extract val images and test images features:
$ th extract_features.lua -boxes_per_image 50 -max_images -1 -input_txt imgs_val_path.txt \
-output_h5 ./data/im2p_val_output.h5 -gpu 0 -use_cudnn 1
$ th extract_features.lua -boxes_per_image 50 -max_images -1 -input_txt imgs_test_path.txt \
-output_h5 ./data/im2p_test_output.h5 -gpu 0 -use_cudnn 1
Run the script:
$ python parse_json.py
In this step, we process the paragraphs_v1.json
file for training and testing. We get the img2paragraph
file in the ./data directory. Its structure is like this:
Finally, we can train and test model, in the terminal:
$ CUDA_VISIBLE_DEVICES=0 ipython
>>> import HRNN_paragraph_batch.py
>>> HRNN_paragraph_batch.train()
After training, we can test the model:
>>> HRNN_paragraph_batch.test()