Content-Based Image Retrieval with Deep Learning
Table of Contents
About the Project
Although we communicate in a variety of ways with each other, our favorite way to do so is via the written word. However, when you think, do you think in words or images? Pictures sometimes are easier to recognize and process than words. What is more, they can be a way of communicating something that’s impossible to verbalize, like thoughts, feelings, memories. So, how can we improve information retrieval and accessibility via images?
There are two computer vision methods we've looked into:
- Bag of Visual Words: The general idea is to represent an image as a set of features. Features consists of keypoints and descriptors. We use the keypoints and descriptors to construct visual vocabularies and then we quantize the image features. By doing so, we have successfully represented images as a frequency histogram of features that are in the images. With the use of visual vocabularies, later, we can perform many tasks, such as classification, retrieval and more.
- Visual Embeddings: Refers to the collection of features of the last fully connected layer (prior to a loss layer) appended to a CNN. The visual embeddings are learned by jointly training the feature extractor with the embedding layer and the classifier on the classification task.
The two Information Retrieval Systems we have explored, are evaluated using the trec_eval evaluation tool and its metrics. Our focus is mainly on the behaviour of mean average precision on the top 100 retrieved images.
For the full presentation of the problem, our approach, the results, and the system's architecture, you can download and look into this report (powerpoint format).
Dataset
To build the search engine, CIFAR-10 dataset has been used. This is an image-based dataset by Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton and it is publicly available from the University of Toronto. You can download it from here.
The data consist of of images, about 50,000 training images and 10,000 test images. Each image is a 32x32 color image. The dataset contains 10 classes which are mutually exclusive (e.g. there is no overlap).
- Classes: airplane, automobile, bird, cat, deer, dog, frog, horse, ship, truck.
To build the search engine, we consider the training images as index images, meaning that these will be indexed in our search engine. In the same manner, we consider the test images as query images. The query relevance is defined as follows: each query image (test image) is related with a set of indexed images (training images) where the relevance relationship depends on the class label.
- For example, a query image that is a car is associated with indexed images that belong to the car class.
Technologies
Programming Language: Python
Search Engine: Elasticsearch
Machine Learning: OpenCV, Scikit-Image, Scikit-Learn
Deep Learning: Pytorch, Ray
Frontend: HTML, Jinja2, CSS
Application Framework: Flask
Other Libraries: NumPy, Matplotlib
What You Need
-
Anaconda
-
Elasticsearch client
-
Virtual environments from .yml files.
- Create the environments from the cbir-ml.yml and cbir-dl.yml file:
conda env create -f cbir-ml.yml
conda env create -f cbir-dl.yml
- Create the environments from the cbir-ml.yml and cbir-dl.yml file:
-
CIFAR-10 dataset.
- Activate cbir-ml environment:
conda activate cbir-ml
- Run
notebooks/Search Engine Files (Miscellaneous).ipynb
jupyter notebook (no need to run the 3d section). - The CIFAR-10 data can be found under
static/cifar10/
.
- Activate cbir-ml environment:
Run the Application
To run the application:
- start Elasticsearch client (on Windows) by running
elasticsearch-x.xx.x/bin/elasticsearch.bat
. - activate cbir-dl environment:
conda activate cbir-dl
- run the following command in the terminal window (in the complete) directory:
python app.py
Then, on the browser, visit http://localhost:5000/
to open the web page.
Demo
- Run application.
- Upload your image query and search.
- Scroll down to see the top 10 relevant images, with respect to your query.
License
Distributed under the MIT License. See LICENSE.md for more information