/assignment1-simple-image-search-drasbaek

Assignment 1 for the course "Visual Analytics" at Aarhus University. Solution by Anton Drasbæk Schiønning.

Primary LanguagePython

Assignment 1: Simple Image Search

Repository Overview

  1. Description
  2. Repository Tree
  3. Usage
  4. Modified Usage
  5. Results
  6. Discussion

Description

This repository includes the solution by Anton Drasbæk Schiønning (202008161) to assignment 1 in the course "Visual Analytics" at Aarhus University.

It is used to complete an image search on an image database, such as the Flower Dataset, to identify the most similar images to a selected one. To achieve this, two search methods are investigated:

  1. Similarity based on color channels
  2. Similarity based on k-nearest neighbors algorithm using image features extracted using VGG16

Repository Tree

├── README.md         
├── assign_desc.md    
├── data/             
│   └── flowers.zip         -----> flowers.zip file with pictures of all flowers (should be unzipped)
├── out/              
│   ├── color_channels   
│   │   ├── colorchannels_most_similar.csv  -----> table of the most similar images to the main images and their similarity scores
│   │   └── colorchannels_most_similar.png  -----> visualization of the most similar images
│   └── knn
├── requirements.txt   
├── setup.sh           
├── src/               
│   ├── color_search.py     -----> script for search based on color channels
│   ├── knn_search.py       -----> script for search based on nearest neighboors for VGG16 features
│   └── utils.py            -----> functions that are used for both image searches
└── run.sh



Usage

This project only assumes that you have Python3 installed. The file flowers.zip in data should be unpacked and inserted into the data directory. The flowers folder that you get is added to .gitignore, so it will not be pushed.

To run the full analysis, including an image search using both color channels and KNN, run the run.sh file from the root directory:

bash run.sh

This will complete the following steps:

  • Create and activate a virtual environment
  • Install requirements to that environment
  • Run the image search using color channels for the default image
  • Run the image search using KNN for the default image
  • Deactivate the environment

Modified Usage

If you want to investigate a different picture or compare with more/less images, you can run a modified analysis.

Setup

Apart from unzipping flowers.zip, you must run the setup file from the root directory to install requirements and initialize a virtual environment:

bash setup.sh

Run Modified Analysis

The adaptations to running an analysis with modifications are available through using the two arguments:

Argument Default Value Description
--filename -f "image_0033.jpg" Filename specifying which image to base search on.
--top_n -n 5 How many most similar images to find.

These can be used for both the analysis for color channels and for the KNN as such:

# run analysis for image_0023 with 8 most similar images using both searching techniques
python src/color_search --filename "image_0023.jpg" --top_n 8
python src/knn_search --filename "image_0023.jpg" --top_n 8

Exemplary Results

The exemplary results are achieved from using the run.sh file and can also be found in the out directory

Color Channel Search

alt text

KNN/VGG16-feature search

alt text

Discussion

As expected, the results reveal that the KNN image search based on VGG16 features is far superior to the search based on color channels.

For the color channel search, it is only image_0358.jpg that looks like a similar species to the target image. However, for the KNN search, all five similar flowers are very similar to the target and presumably the same species. Still, it must be emphasized that this is merely one example and results may vary depending on the chosen image. Therefore, running modified analyses as described above should be encouraged to test whether the tendency holds.