/hadoop-kmeans

Python implementation of k-means clustering algorithm in MapReduce.

Primary LanguageTeXGNU General Public License v3.0GPL-3.0

hadoop-kmeans

Python implementation of k-means clustering algorithm in MapReduce.

  1. Hadoop Installation
  2. Dataset Creation
    1. createDataset.py
    2. Plot of data points
  3. K-means Clustering Algorithm
    1. Instructions for running k-means in Cloudera
    2. run.sh & reader.py
      1. run.sh
      2. reader.py
    3. MapReduce
      1. mapper.py
      2. reducer.py
    4. Plot Representation