/Iris-Clustering

Primary LanguageJupyter Notebook

Iris-Clustering

Description:

In this notebook, we will be using sklearn to conduct hierarchical clustering on the Iris dataset which contains 4 dimensions/attributes and 150 samples. Each sample is labeled as one of the three type of Iris flowers.

In this exercise, we'll ignore the labeling and cluster based on the attributes, then we'll compare the results of different hierarchical clustering techniques with the original labels to see which one does a better job in this scenario. We'll then proceed to visualize the resulting cluster hierarchies.

Note, This project from what I learned from Udacity

The advantage and disadvantages of hierarchical clustering.

Advantages:

  • The resulting hierarchical clustering representations are very informative. They provide us additional ability to visualize the structure of the dataset.
  • It is very potent when the dataset contains real hierarchical relationships.

Disadvantages:

  • It is sensitive to outliers and noise so data needs to be cleaned up beforehand.
  • It is computationally intensive O(N^2)O(N2).