Warning This project reproduces the paper DenseCap with Tensorflow. But currently, I reproduce it with low difficulty.
- Total separate trainning stages. I first trained the RPN model, then I trained the RNN model using best proposals getting from RPN model and corresponding sentences.
- RPN model. I used RoI pooling layer any way.
- Get my project.
$ git clone
the repo. - Configure dependency. You need to configure py-caffe, tensorflow and visual_genome.
- Get the VGG-16 caffemodel. I use the 16-layer VGG network to extract features.
Due to the time, I only trained the net with 10 pictures. If you'd like to see the result, please use the pretrained model to predict on 'images/*.jpg'. You only need to run the 'demo.py'.
- Get the data. Download the train data folder from visual_genome.
- Train RPN model. Run 'RPN.py'.
- Train RNN model. Run 'RNN.py'.