/parallelized_kmeans

A parallelized approach to k-means clustering

Primary LanguagePython

Parallelized K-means Clustering

A script that implements a parallized k-means clustering algorithm using Python's multiprocessing module.

Use

Type the following in ipython: run kmeans.py num_k num_processors max_iterations (e.g. run kmeans.py 5 3 30)
Cluster centroids will be saved in a variable named centroids
Cluster assignments will be saved in a variable named all_assign

Data

The dataset is an altered and simplified version combined Home Mortgage Disclosure Act and Census ACS data from an Urban Institute project: https://adrf.urban.org/