/cluster-validity-indices

cluster validity indices for fuzzy c-means

Primary LanguageMatlab

Introduction: 
The need for the participants to collaborate with each other for this analysis gives rise to the concept of collaborative learning. However, the possibility of the cloud service being semi-honest poses a key challenge: preserving the participants’ privacy. We address this challenge with a two-stage scheme called RG+RP: in the first stage, each participant perturbs his/her data by passing the data through a nonlinear function called repeated Gompertz (RG); in the second stage, he/she then projects his/her data to a lower dimension in an (almost) distance-preserving manner, using a specific random projection (RP) matrix. The nonlinear RG function is designed to mitigate maximum a posteriori (MAP) estimation attacks, while random projection resists independent component analysis (ICA) attacks and ensures clustering accuracy. The proposed two-stage randomisation scheme is assessed in terms of its recovery resistance to MAP estimation attacks. Preliminary theoretical analysis as well as experimental results on synthetic and real-world datasets indicate that RG+RP has better recovery resistance to MAP estimation attacks than most state-of-the-art techniques. For clustering, fuzzy c-means (FCM) is used. Results using seven cluster validity indices (CVIs), root mean squared error (RMSE) and accuracy ratio show that clustering results based on two-stage-perturbed data are comparable to the clustering results based on raw data — this confirms the utility of our privacy-preserving scheme when used with either FCM or HCM.


How to run:
To reproduce the PrivacyTest result(repeated Gompertz+random projection matrix) for purely Gaussian datasets under maximum a priori (MAP) estimation attack, run below commands:
% 9=two_Gompertz+RP; 1=MAP estimation; 0=recovers normal points, 1=recover outliers
runPrivacyTest_DKE('Gaussian', 9, 1, 0);
runPrivacyTest_DKE('Gaussian', 9, 1, 1);
Concatenate and feed the RP or two-stage perturbed data to clustering algorithm to compute CVIs by running: 
fuzzy_twostage_s1
fuzzyComparisonCVI: compute senven CVIs, including ARI,RI,MI,NMIsqrt,VI,NVI,JVI

Requirements:
Matlab
KDE toolbox (https://www.ics.uci.edu/~ihler/code/kde.html):
Put @kde under the your directory and set path in matlab. For example:
/home/ihler/myMatlabCode/@kde, then add to my path: '/home/ihler/myMatlabCode'

Remember to cite the following papers if you use any of the code:
@inproceedings{lyu2017privacy,
  title={Privacy-Preserving Collaborative Deep Learning with Application to Human Activity Recognition},
  author={Lyu, Lingjuan and He, Xuanli and Law, Yee Wei and Palaniswami, Marimuthu},
  booktitle={Proceedings of the 2017 ACM on Conference on Information and Knowledge Management},
  pages={1219--1228},
  year={2017},
  organization={ACM}
}
@article{lyu2018privacy,
  title={Privacy-preserving collaborative fuzzy clustering},
  author={Lyu, Lingjuan and Bezdek, James C and Law, Yee Wei and He, Xuanli and Palaniswami, Marimuthu},
  journal={Data \& Knowledge Engineering},
  year={2018},
  publisher={Elsevier}
}