NMTFcoclust: A Python repository from Saeidhoseinipour

Non-negative Matrix Tri-Factorization for Co-clustering
Brief Description of Models
Requirements
Datasets
Model Implementation
Cite
Supplementary Material
Presentation Video
References

`NMTFcoclust` (Non-negative Matrix Tri-Factorization for Co-clustering)

NMTFcoclust is a package that implements decomposition on a data matrix 𝐗 (document-word counts, movie-viewer ratings, and product-customer purchases matrices) with finding three matrices:

𝐅 (roles membership rows)
𝐆 (roles membership columns)
𝐒 (roles summary matrix)

The low-rank approximation of \mathbf{X}\mathbf{X} by \mathbf{X} \approx \mathbf{FSG}^{\top}\mathbf{X} \approx \mathbf{FSG}^{\top}

Brief description of models

NMTFcoclust implements the proposed algorithm (OPNMTF) and some NMTF according to the objective functions below:

OPNMTF

$$D_{\alpha}(\mathbf{X}||\mathbf{FSG}^{\top})+ \lambda \; D_{\alpha}(\mathbf{I}_{g}||\mathbf{F}^{\top}\mathbf{F})+ \mu \; D_{\alpha}(\mathbf{I}_{s}||\mathbf{G}^{\top}\mathbf{G})$$

PNMTF

$$0.5||\mathbf{X}-\mathbf{F}\mathbf{S}\mathbf{G}^{\top}||^{2}+0.5 \tau \; Tr(\mathbf{F} \Psi_{g}\mathbf{F}^{\top})+0.5 \eta \; Tr(\mathbf{G} \Psi_{s}\mathbf{G}^{\top})+ 0.5 \gamma \; Tr(\mathbf{S}^{\top}\mathbf{S})$$

ONMTF

$$0.5 ||\mathbf{X}-\mathbf{F}\mathbf{S}\mathbf{G}^{\top}||^{2}$$

NBVD $$||\mathbf{X}-\mathbf{FSG}^{\top}||^{2}$$
ONM3T

$$||\mathbf{X}-\mathbf{F}\mathbf{S}\mathbf{G}^{\top}||^{2}+ Tr(\Lambda (\mathbf{F}^{\top}\mathbf{F}-\mathbf{I}_{s}))+ Tr(\Gamma (\mathbf{G}^{\top}\mathbf{G}-\mathbf{I}_{g}))$$

ODNMTF

$$||\mathbf{X}-\mathbf{FF^{\top}XGG}^{\top}||^{2}+ Tr(\Lambda \mathbf{F}^{\top})+ Tr( \Gamma \mathbf{G}^{\top})$$

DNMTF

$$||\mathbf{X}-\mathbf{FF^{\top}XGG}^{\top}||^{2}$$

Requirements

numpy==1.18.3
pandas==1.0.3
scipy==1.4.1
matplotlib==3.0.3
scikit-learn==0.22.2.post1
coclust==0.2.1

Datasets

Datasets	#Documents	#Words	Sporsity(%0)	Number of clusters
CSTR	475	1000	96%	4
WebACE	2340	1000	91.83%	20
Classic3	3891	4303	98%	3
Sports	8580	14870	99.99%	7
Reviews	4069	18483	99.99%	5
RCV1_4Class	9625	29992	99.75%	4
NG20	19949	43586	99.99%	20
20Newsgroups	18846	26214	96.96%	20
TDT2	9394	36771	99.64%	30
RCV1_ori	9625	29992	96.62%	4

import pandas as pd 
import numpy as np
from scipy.io import loadmat
from sklearn.metrics import confusion_matrix 



                                                                   # Read Datasets ------->  Classic3

file_name=r"NMTFcoclust\Dataset\Classic3\classic3.mat"
mydata = loadmat(file_name)

                                                                    # Data matrix 
X_Classic3 = mydata['A'].toarray()
X_Classic3_sum_1 = X_Classic3/X_Classic3.sum()
                                                                   
true_labels = mydata['labels'].flatten().tolist()                   # True labels list [0,0,0,..,1,1,1,..,2,2,2]  n_row_cluster = 3
true_labels = [x+1 for x in true_labels]                            # True labels list [1,1,1,..,2,2,2,..,3,3,3]  n_row_cluster = 3
print(confusion_matrix(true_labels, true_labels))



        Medical:        [[1033    0    0]
 Information Retrieval: [   0 1460    0]
 Aeronautical Systems:  [   0    0 1398]]

Model

from NMTFcoclust.Models.NMTFcoclust_OPNMTF_alpha_2 import OPNMTF
from NMTFcoclust.Evaluation.EV import Process_EV

OPNMTF_alpha = OPNMTF(n_row_clusters = 3, n_col_clusters = 3, landa = 0.3,  mu = 0.3,  alpha = 0.4, max_iter=1)
OPNMTF_alpha.fit(X_Classic3_sum_1)
Process_Ev = Process_EV( true_labels ,X_Classic3_sum_1, OPNMTF_alpha) 



Accuracy (Acc):0.9100488306347982
Normalized Mutual Info (NMI):0.7703948803438703
Adjusted Rand Index (ARI):0.7641161476685447

Confusion Matrix (CM):
				[[1033    0    0]
				 [ 276 1184    0]
				 [   0   74 1324]]
Total Time:  26.558243700000276

OPNMTF, Text mining, Matrix factorization, Co-clustering, Saeid Hoseinipour, divergence

Download full-size image available in ESWA

Cite

Please cite the following paper in your publication if you are using NMTFcoclust in your research:

 @article{NMTFcoclust, 
    title={Orthogonal Parametric Non-negative Matrix Tri-Factorization with $\alpha$-Divergence for Co-clustering}, 
    DOI={10.1016/j.eswa.2023.120680},
  volume  = {231}, 
   number   = {120680},
    journal={Expert Systems with Applications}, 
    authors={Saeid Hoseinipour, Mina Aminghafari, Adel Mohammadpour}, 
    year={2023}
}

Supplementary material

OPNMTF implements on synthetic datasets such as Bernoulli, Poisson, and Truncated Gaussian.

Highlights

Our algorithm works by multiplicative update rules and it is convergence.
Adding two penalties for controlling the orthogonality of row and column clusters.
Unifying a class of algorithms for co-clustering based on $\alpha$-divergence.
All datasets and algorithm codes are available on GitHub as NMTFcoclust repository.