/kmeans_mapreduce_thunders

K-means on Map Reduce implementation (Python) for thunders locations clustering on South hemisphere

Primary LanguagePythonMIT LicenseMIT

This is a Python implementation of K-Means clustering algorithm using the Map Reduce paradigm. It is customized for processing a thunders dataset, extracted from STARNET (Sferics Timing And Ranging NETwork)

Under src/ folder are the mapper and reducer scripts that can be run on a Hadoop environment. There is also a script to run the Hadoop job on Amazon Elastic Map Reduce (run_kmeans_emr.py).
Under data/ folder is a sample of the thunders that "falled" on February 28, 2014.
Under results/ folder are CSV datasets containing all thunders of that date in a single file and the resulting clusters (for k=10 and k=50).

In the picture bellow are presented a thunders distribution map and a heat map to show concentration, generated by ArcGIS Desktop. Mapas da distribuição de raios e mapa de calor

In the following maps, you can see the generated cluster distribution (for k=10 and k=50) against the heat map, showing that K-Means algorithm worked as expected, trying to minimize the distance among clusters and data points. Mapas com os clusters gerados (k=10 e k=50)

A detailed description of the problem and this implementation is available in this post