/MEP-Orthogonal-NMF

Clustering and resource allocation using Deterministic Annealing Approach and Orthogonal Non-negative Matrix Factorization O-(NMF)

Primary LanguagePythonMIT LicenseMIT

MEP-ONMF

Maximum-Entropy-Principle Orthogonal Non-negative Matrix Factorization

Deterministic Annealing Clustering (Maximum-Entropy-Principle Clustering)

Deterministic Annealing (DA) is a clustering, or in a more accurate definition, a facility location algorithm that clusters data points into several groups of different sizes, so that the cumulative distance of each data point to its assigned resource is minimized (for details on this algorithm, please refere to our paper mentioned above). To cluster data into $k$ clusters, just use the code snippet here:

parameter description
k number of clusters we want to partition our data in
tol The tolerence at which bifurcation happens for the critical cluster - smaller values more stable, but results in a slower performance
max_iter maximum number of iterations that the code waits until the convergence of parameters
alpha the value at which $\beta$ is multiplied by at the end of each convergence. Smaller values results in a more stable performance, but slows down the code
purturb the rate at which we purturb new cluster centers
beta_final the final $\beta$ value that we want to stop the algorithm at. If not specified, it is determined automatically
verbos whether to print out iterations' information
normalize whether to 'l2' normalize the input data
input description
X Data matrix $X \in \mathbb{R}^{d \times n}$ where $d$ is each data point's dimension and $n$ is the number of data points
output description
Y cluster centers (resource locations) $Y \in \mathbb{R}^{d \times k}$
P associations matrix $P \in \mathbb{R}^{k \times n}$ determining the belonging of each data point to each cluster center (resource)
{ 
  from Deterministic_Annealing import DA
  model=DA(K,tol=1e-4,max_iter=1000,alpha=1.05,
              purturb=0.01,beta_final=None,verbos=0,normalize=False)
  model.fit(X,Px='auto')
  Y,P=model.cluster()
  print(model.return_cost())
}

For example, for a dataset that consists of 16 psudu-symmetric clusters as illustrated if Fig 1, the DA finds the cluster centers:

{ 
  model.plot()
}

Where in this animation below, you can see how by increasing the value of $\beta$ these cluster centers split. In the beginning, there is only one cluster center, and as the value of $\beta$ reaches a critical value for a cluster center, that cluster center will bifurcate, Here animation_frame is equivalent to $\beta$ values.

{ 
  model.animation()
}
Clustering.mp4

These critical $\beta$ values provide useful information about our dataset. By plotting the log of these values using the command below, we can determine the true number of clusters in our dataset:

{ 
  model.plot_criticals(log=True)
}

index

By looking at this diagram, we can see that there are large gaps at 2,4,8 and 16 between these critical $\beta$ values. These show that, depending at the resolution you want to look at your dataset with, there are 2,4,8 or 16 clusters there. However, the largest gap occurs at 16, so it means that there are 16 clusters in this dataset, that can be taken as the true number of clusters. To get an accurate plot, it is advised to use small values for alpha and purturb. Also using :

{ 
  model.return_true_number()
}

returns the value 16 for this dataset.

The command mesh_plot specifies which cluster center the data belong to. For example, for a given dataset of 15 clusters, we get:

{ 
  model.mesh_plot()
}

index

The Mass-Constrained DA allows capturing unbalanced datasets where one or more clusters have significantly lower number of data points compared with the other clusters. The weight of each cluster can be shown using the pie_chart command. This is very useful for outlier detection applications. For example, for a given dataset like:

index

{ 
  model.pie_chart()
}

index

The resulting pie chart indicates that the cluster Y_6 is an outlier.


Orthogonal Non-negative Matrix Factorization (ONMF)

The orthogonal NMF (ONMF) poses the following problem:

$$ \begin{align} \min \quad &D(X,WH) \\ \textrm{s.t.} \quad &W^{\intercal}W=I \quad \text{or} \quad HH^{\intercal}=I \\ & W_{ij},H_{ij} \geq 0 \quad \forall \ i,j \end{align} $$

DA can also be used to solve the ONMF problem.

input description
X Data matrix $X \in \mathbb{R}^{d \times n}_{+}$ where $d$ is each data point's dimension and $n$ is the number of data points
output description
W Features matrix $W \in \mathbb{R}^{d \times k}_{+}$
H Mixing matrix $H \in \mathbb{R}^{k \times n}_{+}$
model DA model associated with the data matrix. Can be used to identify the true number of features
{ 
  from onmf_DA import ONMF_DA
  W,H,model=ONMF_DA.func(X,k,alpha=alpha,purturb=p,tol=tol)
}

For $H$-orthogonal, put $X$ and for $W$-orthogonal, put $X^{\intercal}$ as input.