A script that implements a parallized k-means clustering algorithm using Python's multiprocessing module.
Type the following in ipython: run kmeans.py num_k num_processors max_iterations (e.g. run kmeans.py 5 3 30)
Cluster centroids will be saved in a variable named centroids
Cluster assignments will be saved in a variable named all_assign
The dataset is an altered and simplified version combined Home Mortgage Disclosure Act and Census ACS data from an Urban Institute project: https://adrf.urban.org/