I use Request Psychological Advice texts in Persian. I clean data and prepare it with the Hazm project. Then cluster them by using Genetic_Kmeans Algorithm and compare results with normal Kmeans and Birch Algorithms.
- Minmax normalization for standardization
- Davies–Bouldin index for evaluation of each cluster
- IN GENETIC :
- Rank based selection
- One point crossover
- panda
- numpy
python __main__.py
- data which I analysis them is Iris
data/iris.csv
have 3 column anddata/iris2.csv
have 4 column anddata/isis_with_header.csv
with header
config.txt
contain control parameters- kmax : maximum number of clusters
- budget : budget of how many times run GA
- numOInd : number of Individual
- Ps : probability of ranking Selection
- Pc : probability of crossover
- Pm : probability of mutation
norm_data.csv
is normalization datacluster_json
is centroid of each clusterresult.csv
is data with labeled to each cluster
- the accuracy of GA on K-means : 88%
- the accuracy of k-means++ : 83%