A PyTorch implementation on Non-negative Matrix Factorization.
This package is published on PyPI:
pip install nmf-torch
Given a non-negative numeric matrix X
of shape M-by-N (M is number of samples, N number of features) in either numpy array or torch tensor structure, run the following code:
from nmf import run_nmf H, W, err = run_nmf(X, n_components=20)
will decompose X
into two new non-negative matrices:
H
of shape (M, 20), representing the transformed coordinates of samples regarding the 20 components;W
of shape (20, N), representing the composition of each component in terms of features;
along with the loss between X
and its approximation H*W
.
By default, run_nmf
function uses the batch HALS solver for NMF decomposition. In total, there are other solvers available in NMF-torch:
- HALS: Hierarchical Alternative Least Squares ([Kimura et al., 2015]). The default.
- MU: Multiplicative Update. Set
algo='mu'
inrun_nmf
function. - BPP: Alternative non-negative least squares with Block Principal Pivoting method ([Kim & Park, 2011]). Set
algo='bpp'
inrun_nmf
function.
Besides, each solver has two modes: batch and online.
The online mode is a modified version which is scalable for input matrix of a large number of samples.
You can set mode='online'
in run_nmf
function to switch to use the online mode.
The default beta loss is Frobenius (L2) distance, which is the most commonly used.
By changing beta_loss
parameter in run_nmf
function,
users can specify other beta loss metrics:
- KL divergence:
beta_loss='kullback-leibler'
orbeta_loss=1.0
; - Itakura-Saito divergence:
beta_loss='itakura-saito'
orbeta_loss=0
; - Or any non-negative float number.
Notice that since online mode only works for L2 loss, if you specify other beta loss, run_nmf
will automatically switch back to batch mode.
For the other parameters in run_nmf
function, please type help(run_nmf)
in your Python interpreter to view.
In this case, we have a list of k
batches, with their corresponding non-negative numeric matrices to be integrated.
Let X
be such a list, and all matrices in X
have the same number of features,
i.e. each Xi in X
has shape (Mi, N), where Mi is number of samples in batch i, and N is number of features.
The following code:
from nmf import integrative_nmf H, W, V, err = integrative_nmf(X, n_components=20)
will perform iNMF, which results in the following non-negative matrices:
H
: List of matrices of shape (Mi, 20), each of which represents the transformed coordinates of samples regarding components of the corresponding batch;W
of shape (20, N), representing the common composition (shared information) across the given batches in terms of features;V
: List of matrices of the same shape (20, N), each of which represents the batch-specific composition in terms of features of the corresponding batch,
along with the overall L2 loss between Xi and its approximation Hi * (W + Vi) for each batch i.
Similarly as in run_nmf
function above, integrative_nmf
provides 2 modes (batch and online) and 3 solvers: HALS, MU, and BPP.
By default, batch HALS is used. You can switch to other solvers and modes by specifying algo
and mode
parameters.
There is another important parameter lam
for the coefficient for regularization terms, with default value 5.0
.
If set to 0
, then no regularization will be applied.
Notice that only L2 loss is accepted in iNMF.
For the other parameters in integrative_nmf
function, please type help(integrative_nmf)
in your Python interpreter to view.