Toolbox for large scale subspace clustering
This project provides python implementation of the elastic net subspace clustering (EnSC) and the sparse subspace clustering by orthogonal matching pursuit (SSC-OMP) algorithms described in the following two papers:
- C. You, C.-G. Li, D. Robinson, R. Vidal, Oracle Based Active Set Algorithm for Scalable Elastic Net Subspace Clustering, CVPR 2016
- C. You, D. Robinson, R. Vidal, Scalable Sparse Subspace Clustering by Orthogonal Matching Pursuit, CVPR 2016
The clustering algorithms are implemented as two classes ElasticNetSubspaceClustering and SparseSubspaceClusteringOMP that have a fit function to learn the clusters. They may be used in the same way as the KMeans, SpectralClustering and others that are in sklearn.cluster.
import numpy as np
from cluster.selfrepresentation import ElasticNetSubspaceClustering
# generate 7 data points from 3 independent subspaces as columns of data matrix X
X = np.array([[1.0, -1.0, 0.0, 0.0, 0.0, 0.0, 0.0],
[1.0, 0.5, 0.0, 0.0, 0.0, 0.0, 0.0],
[0.0, 0.0, 1.0, 0.2, 0.0, 0.0, 0.0],
[0.0, 0.0, 0.2, 1.0, 0.0, 0.0, 0.0],
[0.0, 0.0, 0.0, 0.0, 1.0, -1.0, 0.0],
[0.0, 0.0, 0.0, 0.0, 1.0, 1.0, -1.0]])
model = ElasticNetSubspaceClustering(n_clusters=3,algorithm='lasso_lars',gamma=50).fit(X.T)
print(model.labels_)
# this should give you array([1, 1, 0, 0, 2, 2, 2]) or a permutation of these labels
We compare EnSC and SSC-OMP with the k-means and spectral clustering algorithms on synthetically generated dataset where data is sampled from a union of subspaces. See the file run_synthetic.py for details.
The following two figures report the clustering accuracy and running time as the scale of the dataset increases from 500 to 0.5 million.
EnSC and SSC-OMP not only achieves significantly higher clustering accuracy but also are much faster for large-scale data than spectral clustering.
numpy, scipy, scikit-learn
The SPAMS package (http://spams-devel.gforge.inria.fr/downloads.html) is recommendedfor faster computation. It may be used by setting algorithm='spams' in ElasticNetSubspaceClustering. On Ubuntu 16.04, SPAMS may be installed by the following commands:
sudo apt install liblapack-dev libopenblas-dev
pip install --index-url https://test.pypi.org/simple/ spams