- Non-negative Matrix Tri-Factorization for Co-clustering
- Brief Description of Models
- Requirements
- Datasets
- Model Implementation
- Cite
- Supplementary Material
- Presentation Video
- References
NMTFcoclust is a package that implements decomposition on a data matrix 𝐗 (document-word counts, movie-viewer ratings, and product-customer purchases matrices) with finding three matrices:
- 𝐅 (roles membership rows)
- 𝐆 (roles membership columns)
- 𝐒 (roles summary matrix)
The low-rank approximation of \mathbf{X}\mathbf{X} by \mathbf{X} \approx \mathbf{FSG}^{\top}\mathbf{X} \approx \mathbf{FSG}^{\top}
NMTFcoclust
implements the proposed algorithm (OPNMTF) and some NMTF according to the objective functions below:
numpy==1.18.3
pandas==1.0.3
scipy==1.4.1
matplotlib==3.0.3
scikit-learn==0.22.2.post1
coclust==0.2.1
Datasets | #Documents | #Words | Sporsity(%0) | Number of clusters |
---|---|---|---|---|
CSTR | 475 | 1000 | 96% | 4 |
WebACE | 2340 | 1000 | 91.83% | 20 |
Classic3 | 3891 | 4303 | 98% | 3 |
Sports | 8580 | 14870 | 99.99% | 7 |
Reviews | 4069 | 18483 | 99.99% | 5 |
RCV1_4Class | 9625 | 29992 | 99.75% | 4 |
NG20 | 19949 | 43586 | 99.99% | 20 |
20Newsgroups | 18846 | 26214 | 96.96% | 20 |
TDT2 | 9394 | 36771 | 99.64% | 30 |
RCV1_ori | 9625 | 29992 | 96.62% | 4 |
import pandas as pd
import numpy as np
from scipy.io import loadmat
from sklearn.metrics import confusion_matrix
# Read Datasets -------> Classic3
file_name=r"NMTFcoclust\Dataset\Classic3\classic3.mat"
mydata = loadmat(file_name)
# Data matrix
X_Classic3 = mydata['A'].toarray()
X_Classic3_sum_1 = X_Classic3/X_Classic3.sum()
true_labels = mydata['labels'].flatten().tolist() # True labels list [0,0,0,..,1,1,1,..,2,2,2] n_row_cluster = 3
true_labels = [x+1 for x in true_labels] # True labels list [1,1,1,..,2,2,2,..,3,3,3] n_row_cluster = 3
print(confusion_matrix(true_labels, true_labels))
Medical: [[1033 0 0]
Information Retrieval: [ 0 1460 0]
Aeronautical Systems: [ 0 0 1398]]
from NMTFcoclust.Models.NMTFcoclust_OPNMTF_alpha_2 import OPNMTF
from NMTFcoclust.Evaluation.EV import Process_EV
OPNMTF_alpha = OPNMTF(n_row_clusters = 3, n_col_clusters = 3, landa = 0.3, mu = 0.3, alpha = 0.4, max_iter=1)
OPNMTF_alpha.fit(X_Classic3_sum_1)
Process_Ev = Process_EV( true_labels ,X_Classic3_sum_1, OPNMTF_alpha)
Accuracy (Acc):0.9100488306347982
Normalized Mutual Info (NMI):0.7703948803438703
Adjusted Rand Index (ARI):0.7641161476685447
Confusion Matrix (CM):
[[1033 0 0]
[ 276 1184 0]
[ 0 74 1324]]
Total Time: 26.558243700000276
Please cite the following paper in your publication if you are using NMTFcoclust
in your research:
@article{NMTFcoclust,
title={Orthogonal Parametric Non-negative Matrix Tri-Factorization with $\alpha$-Divergence for Co-clustering},
DOI={10.1016/j.eswa.2023.120680},
volume = {231},
number = {120680},
journal={Expert Systems with Applications},
authors={Saeid Hoseinipour, Mina Aminghafari, Adel Mohammadpour},
year={2023}
}
OPNMTF implements on synthetic datasets such as Bernoulli, Poisson, and Truncated Gaussian.
- Available from GitHub
- Available from ESWA
- Pre-review version
- Personalized URL providing 50 days' free access to the orginal article
- Industry Relations and Applications
- Our algorithm works by multiplicative update rules and it is convergence.
- Adding two penalties for controlling the orthogonality of row and column clusters.
- Unifying a class of algorithms for co-clustering based on
$\alpha$ -divergence. - All datasets and algorithm codes are available on GitHub as
NMTFcoclust
repository.
[7] Li et al, Nonnegative Matrix Factorizations for Clustering: A Survey (2019), Data Clustering.