Image cation is an interesting field in DL which combines Convolutional Neural Network and Recurrent Neural Network. It first extracts image features by CNN and feeds them into a RNN to generate a description of the image.
This part is based on Karpathy's neuraltalk2 which implemented by Torch. I reimplemented it in Tensorflow and converted the pretrained model. The model contains VGG-16 and LSTM. Maybe Google's im2txt is a better choice, but there is no pretrained model available and I don't have a powerful graphic card to train own model. For more details, please see these two repos.
caption: a cat and a dog are standing in a room
caption: a white and black cat is sitting on the ground
caption: a brown bear sitting on top of a rock
caption: a black and white cat sitting on top of a table
In last decades, some features like visual bag of words was developed for image search. But now, almost every search engine uses DL to extract image features automatically. To avoid training a new model, I use the VGG-16 above as feature extractor. But it was finetuned for image caption, so the result is not that good as I thought.
I use the last pooling layer as the feature and KDTree to search top-k nearest neighbors. The similarity is measured in euclidean distance. I build the database by a subset of MSCOCO with 10000 images. There is an example below.
query
top-3 inqueries
Using captions as keywords, it is easily to search similar images containing same objects.
As we know, keyword search is a complex optimization problem. I simply use weighted sum as objective function that all nouns and verbs with 1.0, adjectives and adverbs with 0.5 and all others with 0.1. The larger the weighted sum is, the more similar. There is an example below.
query
a red car
top-3 inqueries
The first nice neural style work is this paper which combines content features and style well and sovle it as optimization problem. For more details, you can see jcjohnson's repo implemented by Torch.
I use the famous painting Girl with a Pearl Earring as content and The Starry Night as style.
jcjohnson improved the neural style by training a transform Net which transfer a style into images in real time.(paper and repo in Torch)
I reimplemented it in Tensorflow, but I hadn't trained an usable model because it would take too much time on my graphic card. I have no enough time for experiments to find good hyperparameters by now, and I will do it in the future.
You can see this repo also implemented in Tensorflow.
This is a fancinating implementation of Neural Style in Tensorflow
This is an implementation in Tensorflow of the paper which can transfer arbitrary style.
A simple GUI was wrote which contains all the methods above.
- tensorflow
- PIL
- numpy
- sklearn
- nltk
- PyQt4
This is the pretrained model, place it in the root.(I will also find an online disk out of chinese mainland.)
Some ideas are borrowed from: