
A PyTorch implementation of the "Deep Variation-structured Reinforcement Learning for Visual Relationship and Attribute Detection" paper by Liang et. al.

This repository contains a PyTorch implementation of the Deep Variation-structured Reinforcement Learning for Visual Relationship and Attribute Detection paper by Liang et. al [5].


Downloading the Data

First create a folder called data in the root folder of the repository (mkdir data). We will be using the Visual Genome Dataset to train this network. Follow the steps below to obtain the necessary data.

  1. You can access the first part of the data set here and the second part here. Once you have downloaded both parts, create a folder that contains all the images from both parts in a single folder called VG_100K in the data folder.
  2. Inside the data folder, create a folder called raw_data. Download the following files inside this new folder. Be sure to unzip the zip files.
wget http://visualgenome.org/static/data/dataset/objects.json.zip
wget http://visualgenome.org/static/data/dataset/relationships.json.zip
wget http://visualgenome.org/static/data/dataset/object_alias.txt
wget http://visualgenome.org/static/data/dataset/relationship_alias.txt
wget http://visualgenome.org/static/data/dataset/attributes.json.zip
wget http://visualgenome.org/static/data/dataset/scene_graphs.json.zip
  1. Create a folder called data_samples inside the data file. This is where the train, validation, and test files will be placed.


We will be using Pytorch and other Python libraries to create our VRL model. Follow the instructions below to obtaint the necessary dependencies

  1. Install the appropriate version of Pytorch for your system from the Pytorch website.
  2. Run pip install -r requirements.txt
  3. Do the following if you want to use Faster RCNN to generate box proposals and class labels, rather than using ground truth. We are using the faster rcnn implementation from here.
  • cd faster_rcnn
  • ./make.sh
  • Download the faster-rcnn model here to the same directory as main.py
  • Make sure cuda is in your PATH (ex: export PATH=$PATH:/usr/local/cuda/bin)
  1. Do the following if you want to include skip thought history embeddings in your state vectors. We are using the skip thought implementation from here.
  • cd skipthoughts
  • Download the following:
wget http://www.cs.toronto.edu/~rkiros/models/dictionary.txt
wget http://www.cs.toronto.edu/~rkiros/models/utable.npy
wget http://www.cs.toronto.edu/~rkiros/models/btable.npy
wget http://www.cs.toronto.edu/~rkiros/models/uni_skip.npz
wget http://www.cs.toronto.edu/~rkiros/models/uni_skip.npz.pkl
wget http://www.cs.toronto.edu/~rkiros/models/bi_skip.npz
wget http://www.cs.toronto.edu/~rkiros/models/bi_skip.npz.pkl

Create the Semantic Action Graph and Data Files

Run ./setup.sh. This will create predicate_counts.json, attribute_counts.json, and entity_counts.json which contains the number of times a predicate, attribute, entity (respectively) have occured in the Visual Genome dataset. This will be used to create the semantic action graph, which is saved as graph.pickle. We only consider predicates, attributes, and entities that have appeared at least 200 times. You can change this parameter by including a --min_occurances flag when running create_semantic_action_graph.py in setup.sh. The smaller this number is, the larger your graph will be. Lastly, train_data.json, validation_data.json, and test_data.json are created in data/data_samples/ which are used to train/evaluate the model.


To begin training the network, run

python main.py --train

There are numerous flags that can be modified. You can see a list of these flags by running python main.py -h or by looking at main.py.


To evalutate a pretrained model, run

python main.py --test

Viewing Visualizations

After running main.py with the --train or --evaluate flag, a file called image_states.pickle will be created. image_states.pickle will be created after going through at least one epoch when using the --train flag. Move image_states.pickle into the graphviz folder (mv image_states.pickle graphviz). Then run the following commands:

  1. Run python pickle_to_files.py. This will create 2 JSON files for each image; one representing the ground truth scene graph and the other representing the scene graph created by the VRL model.
  2. Run python visualize_scene_graph.py --graph <JSON filename> to visualize the graph represented by the JSON file.


Below we have included one example of a scene graph generated using VRL.

This project was originally done for a Reinforcement Learning class at Stanford University (CS234). The poster for this project can be found here and the final report can be found here.


