This is the code for the AAAI 2021 paper: Variational Fair Clustering. This clustering method helps you to find clusters with specified proportions of different demographic groups pertaining to a sensitive attribute of the dataset (e.g. race, gender etc.), for any well-known clustering method such as K-means, K-median or Spectral clustering (Normalized cut) etc. in a flexible and scalable way.
- The code is tested on python 3.6. Install the requirements listed in (requirements.txt) using pip or conda.
- Download the datasets other than the synthetics from the respective links given in the paper and put in the respective data/[dataset] directory.
To evaluate the code simply run the following script:
sh evaluate_Fair_clustering.sh
Change the options inside the scripts accordingly. The options are fairly described in the (test_fair_clustering.py). Note that, the weight of the fairness term (--lmbda) can be much higher (even more than 100) to impose fairness. --lmbda works as a trade-off between the clustering objective and fairness as discussed in the paper.
For Synthetic dataset with two equal demographic groups (50/50) and Synthetic-unequal dataset with uneven proportions (75/25), we can impose the required proportions according to the dataset while clustering by increasing the weight of the fairness term (--lmbda). With a suitable lambda we can get the required given proportions in each cluster.