/keras_image_similarity_training

A Jupyter Notebook and python scripts that allows users to easily train a siamese network on image similarity, export the model to a SavedModel file, and index images for kNN search.

Primary LanguageJupyter NotebookBSD 2-Clause "Simplified" LicenseBSD-2-Clause

Keras Image Similarity Training

Train a convolutional neural network to determine content-based similarity between images. This is done with a siamese neural network as shown here. The model learns from labeled images of similar and dissimiar pairs. The model's objective is to embed similar pairs nearby and dissimilar pairs far apart. This property of the latent space means kNN searches can find similar images. This idea is based on the paper found here.

Requirements

Labeled Data

For both training and indexing, labeled data will be needed. This data needed is multiple images of each unique item. Create a JSON file such as the one seen below. The key of top level items should be the item_id. Each value should have an images array, which contains data on each image for that item. Optionally, you can also provide labels for each item_id, where two items sharing some label will not be considered dissimilar.

{
	"item_id_1": {
		"images": [
			{
				"filename": "relative/path/to/item_1_1.jpg"
			},
			{
				"filename": "relative/path/to/item_1_2.jpg"
			}
		],
		"labels": ["red", "pink"]
	},
	"item_id_2": {
		"images": [
			{
				"filename": "relative/path/to/item_2_1.jpg"
			},
			{
				"filename": "relative/path/to/item_2_2.jpg"
			}
		],
		"labels": ["blue"]
	}
}

Training

For training a model, you will definitely need a GPU. If you do not have one, then we suggest only using a pretrained model provided by Keras's API.

Notebook

We provide a Jupyter notebook that will walk you through how to train a siamese network. Note you will need a machine with an Nvidia GPU here.

DATA=/path/to/images/and/label/files make notebook

Exporting Model

If you trained a model, run the following

make bash-cpu
python utilities.py --export savedmodel --keras-model checkpoints/file_saved_by_notebook.hdf5

Else you can use Google's pretrained model on classification

make bash-cpu
python utilities.py --export savedmodel

Indexing

Images need to be embedded and indexed for fast kNN search.

GPU and a trained model

DATA=/path/to/images/and/label/files make bash-gpu
python utilities.py --export balltree \
    --keras-model checkpoints/file_saved_by_notebook.hdf5 \
    --labeled-data /data/path_to_labeled_images_file.json \
    --image-dir /data/whereever_the_base_image_dir_is_mounted

GPU and Google's pretrained model

DATA=/path/to/images/and/label/files make bash-gpu
python utilities.py --export balltree \
    --labeled-data /data/path_to_labeled_images_file.json \
    --image-dir /data/whereever_the_base_image_dir_is_mounted

CPU and Google's pretrained model

DATA=/path/to/images/and/label/files make bash-cpu
python utilities.py --export balltree \
    --labeled-data /data/path_to_labeled_images_file.json \
    --image-dir /data/whereever_the_base_image_dir_is_mounted