/machine-learning-3

K-means clustering algorithm to cluster the images provided in the dataset available on https://classroom.github.com/a/pCY-PkJx. These images are part of the machine learning benchmark CIFAR-10 dataset for three types (airplane, bird and truck). You need to implement the following steps: 1. Apply your K-means algorithm with K = 3 to the data provided in the train folder. One important aspect of K-means that changes the results significantly is the initialization. In your implementation, you should run K-means 10 different times starting with a different random initialization each time. 2. Use the Davis Bouldin index to choose the best outcome out of the 10 outcomes you obtained in Step 1. 3. For each class of images (airplane, bird and truck), identify the cluster to which the majority of images belong and, hence the corresponding center in this case. For instance, if the 1 st 5K images were clustered as (500 in Cluster 1, 1000 in Cluster 2,3500 in Cluster 3), this means that the majority of images belonging to the airplane class were clustered into Cluster 3. Thus, Cluster 3 should be considered the cluster representing the airplane class and the center of cluster 3 should be used in Step 4. 4. Use the cluster centers identified in Step 3 to classify the images in the test folder to one of the three types (airplane, bird or truck) based on the shortest Euclidean distance to the centers.

Primary LanguageJupyter NotebookBSD 3-Clause "New" or "Revised" LicenseBSD-3-Clause

Open in Visual Studio Code

csen1022-assignment

Template for the jupyter notebook to use in order to submit CSEN1022 assignments

Folder Structure

Data
├── test
│   └── airplane (1000 images)
│   └── bird (1000 images)
│   └── truck (1000 images)
└── train
    └── airplane (5000 images)
    └── bird (5000 images)
    └── truck (5000 images)
Assignment_3.ipynb ── This is the only file that you need to work on and submit

Prerequisites

This repository requires that you have:-

Installation of Prerequisites

Easy way (More HD space, less hassle)

Install Anaconda then just run Jupyter.

Hard way (Less HD space, more hassle)

Install Python3+

Make sure Python and pip are added to environment variables Python

From your Linux, Mac, or Windows terminal, verify that both are installed correctly.

$ python --version
$ pip --version

Using the same terminal install numpy, matplotlib, pillow and notebook

$ pip install numpy matplotlib pillow notebook

Alternative way (Cloud but you have to upload the data)

Create a New Notebook from here Google Colab

Upload the Data.zip folder

from google.colab import files
uploaded = files.upload()

Extract the zipped folder into the cloud

!unzip [foldername].zip

How To Run

From your terminal, run this command then navigate to the Assignment.ipynb file

jupyter notebook

License

BSD 3-Clause "New" or "Revised" License