/kmeans-dbscan-tutorial

A clustering tutorial with scikit-learn for beginners.

Primary LanguageHTML

kmeans-dbscan-tutorial

A clustering tutorial with scikit-learn for beginners.

Contents

  1. Introduction to k-means, k-means++ and DBSCAN (Density-Based Spatial Clustering Algorithm with Noise).

  2. Explore common drawbacks of k-means, such as:

  • Need to choose the right number of clusters.
  • Cannot handle Noise Data and Outliers.
  • Cannot handle Non-spherical Data. And of course, present solutions for the above drawbacks.
  1. Introduction to supervised and unsupervised methods for measuring cluster quality such as homogeneity, completeness and the Silhouette Coefficient (part of section 2).

  2. Two simple exercises (k-means & DBSCAN) along with the tutorial.

Get Started

  • Please refer to the slides in slides/ or review then on google drive, there are Chinese version and English version.
  • Codes are in tutorial_and_labs/, each .ipynb has its corresponding .html.