lanl/pyDNMFk

Matrix input file format questions

ChadBurdyshaw opened this issue · 1 comments

First of all, thank you for making your software available as open source. I've installed this library for a research client and they have their matrices stored as an anndata object in CSR format in an hdf5 file. From what I can tell, your data_io.py function read in matrices as csv, mat (matlab?), npy, npz formats.
Is there a way/plan to read anndata csr from hdf5? Any recommendations to convert to npy (csv would be too large)?
And for the currently available formats, are they reading in csr sparse? Can you preprocess the A matrix to be distributed into multiple files, or does A have to be read from a single file and then distributed?

Currently, pyDNMFk has support for only dense arrays. For sparse and GPU accelerated computing, I recommend utilizing our library https://github.com/lanl/T-ELF . Examples are listed https://github.com/lanl/T-ELF/tree/main/examples/NMFk . You can load your data and perform NMFk as demonstrated in one of the examples with csr arrays and GPUs.