/VectorSearch_image_retrieval

Here is an image retrieval system using mongodb vector search features. Images has been converted into embeddings via a huggingface transformer model.

Primary LanguageJupyter Notebook

MongoDB Atlas VectorSearch Image Retrieval System

Description

The aim of this work is to find similar images, mostly similar jean, tshirt, tv or sofa image from our dataset. We leveraged MongoDB Atlas VectorSearch feature to create the image search similarity system to retrieve the information. The dataset images have been converted into embeddings and hosted on a MongoDB Atlas cluster. For retrieval, the querying image will be first converted into embeddings then via MongoDB Atlas VectorSearch function, retrieve the top k images, where k=5 in our case. Cosine similarity is used for distance calculation. Enbeddings are generated via Vision Transformer (ViT) pretrained model.

Dataset

The dataset used in this work has been downloaded from kaggle and is large of 796 images, divided into 4 classes: Jean, Tshirt, TV and Sofa.

Dataset Collection Overview

VectorSearch_DB_Overview

Requirements

  • Transformers
  • OS
  • Pillow
  • Requests
  • Glob
  • Matplotlib
  • Numpy
  • Dotenv
  • PyMongo

Steps

  1. Load all images from your dataset and create their embeddings via a pretrained vision transformer model.
  2. Pair image_filename and corresponding embeddings into a dictionary and store in MongoDB Atlas database.
  3. Create search index in MongoDB Atlas (see below image) to be later used for the image retrieval.
  4. Load and create embeddings for the querying image then retrieve similar images

System Pipeline

VectorSearch_pipeline

Atlas Search Configuration

VectorSearch

Results

Below we have a set of retrieval results. Based in below tests, we can observe that the system combines both the object shape and color to retrieve the perfect match. It finds the exact match of the querying image.

Query 1

test_result_1

Query 2

test_result_2

Query 3

test_result_3

Query 4

test_result_4

Query 5

test_result_5

Notes

  1. You need to create a .env file containing your MongoDB Atlas DB account credentials, also called connection string used to connect to your cluster.
  2. Adjust the paths in the code based on your local directory.