Python implementation of the sparse clustering methods of Witten and Tibshirani (2010).
Each sample has 1000 features, and 1 % of them are informative.
Hierarchical clustering | Sparse hierarchical clustering |
---|---|
- Sparse hierarchical clustering
- Sparse KMeans clustering
- Selection of turning parameter for sparse hierarchical clustering
- Selection of turning parameter for sparse KMeans clustering
git clone https://github.com/tsurumeso/pysparcl.git
cd pysparcl
python setup.py install
Perform sparse hierarchical clustering.
cd demo
python run.py
Perform sparse KMeans clustering.
cd demo
python run.py -m kmeans
import matplotlib.pyplot as plt
import pysparcl
from scipy.cluster.hierarchy import dendrogram
from scipy.cluster.hierarchy import linkage
# X is a numpy array of (samples, features) shape.
perm = pysparcl.hierarchy.permute(X)
result = pysparcl.hierarchy.pdist(X, wbound=perm['bestw'])
link = linkage(result['u'], method='average')
dendro = dendrogram(link)
plt.show()
- [1] D. M. Witten and R. Tibshirani, "A framework for feature selection in clustering",
Am. Stat., vol. 105, no. 490, pp. 713–726, 2010. - [2] "sparcl: Perform sparse hierarchical clustering and sparse k-means clustering",
https://cran.r-project.org/web/packages/sparcl/index.html