ramp color by kmeans clustering result

Question

ramp color by kmeans clustering result

Opened this issue 5 years ago · 3 comments

If the distribution of target feature looks like this:

Commonly, we want to assign colors based on the breaks like this

There are a few ways to make it happen (e.g. observe the distribution plot -> figure out the percentile of the breaks).

But it would be much much convenient and accurate to apply Kmeans (or other clustering methods) to find the breaks.

Currently, it needs lines of code and some value transformations to make it happen. I hope it could be included as one of the built-in ramp color options as Quantile / EqualInterval / StdDev etc.

cc @andy-esch

Answer 1 · 2020-01-30T20:12:09.000Z

For example,

# kmeans for target feature: confirmedCount
color_kmeans = KMeans(n_clusters=7)
data_province['color'] = color_kmeans.fit_predict(data_province['confirmedCount'].values.reshape(-1, 1))

# the kmeans returned cluster numbers are not ordered by target feature values. 
# the following step guarantees 
# feature value of records in cluster 0 > in cluster 1 > 2 > 3 > ...
color_dict = {_: i for i, _ in enumerate(list(data_province.groupby('color').apply(lambda x: x['confirmedCount'].mean()).sort_values().index))}
data_province['color'] = data_province['color'].apply(lambda x: str(color_dict[x]))

# rename the cluster name (0,1,2,3...) as [min, max] of values in each cluster. 
range_dict = data_province.groupby('color').apply(lambda x: str([min(x['confirmedCount']), max(x['confirmedCount'])])).to_dict()
data_province['range'] = data_province['color'].apply(lambda x: range_dict[x])

# the following step makes sure it works well with `color_category_legend` (continuous legend doesn't work as expected)
# the items in legend are in order 
data_province = data_province.sort_values(by='color')

Answer 2 · 2020-02-13T19:38:16.000Z

seems like pygeoda now supports NaturalBreaks

Answer 3 · 2020-02-28T18:18:49.000Z

https://github.com/pysal/mapclassify
https://pysal.org/mapclassify/

cc @andy-esch