livnatje/DIALOGUE

Issue with t() for sparse tpm matrices

Opened this issue · 2 comments

Dear DIALOGUE team,

i want to run make.cell.type with tpm being a sparse matrix (dgRMatrix or dgCMatrix). When running with the loaded make.cell.type method I get the following error (when making tpm dense by as.matrix(tpm) all is fine):

Error in t.default(tpm): Argument ist keine Matrix
Traceback:

1. make.cell.type(name = cell_type, tpm = tpm, samples = samples, 
 .     X = pca, metadata = adata_ct$obs[c("Pool_ID")], cellQ = adata_ct$obs$QC_total_UMI, 
 .     )
2. cell.type(name = gsub("_", "", name), cells = colnames(tpm), 
 .     genes = rownames(tpm), cellQ = cellQ, tpm = tpm, tpmAv = tpmAv, 
 .     qcAv = aggregate(x = cellQ, by = list(samples), FUN = mean), 
 .     X = X, samples = samples, metadata = cbind.data.frame(cellQ = cellQ, 
 .         metadata), extra.scores = list())
3. new(structure("cell.type", package = "DIALOGUE"), ...)
4. initialize(value, ...)
5. initialize(value, ...)
6. t(average.mat.rows(t(tpm), samples))
7. average.mat.rows(t(tpm), samples)
8. laply(ids.u, function(x) {
 .     return(f(m[is.element(ids, x), ]))
 . })
9. llply(.data = .data, .fun = .fun, ..., .progress = .progress, 
 .     .inform = .inform, .parallel = .parallel, .paropts = .paropts)
10. structure(lapply(pieces, .fun, ...), dim = dim(pieces))
11. lapply(pieces, .fun, ...)
12. FUN(X[[i]], ...)
13. f(m[is.element(ids, x), ])
14. is.data.frame(x)
15. t(tpm)
16. t.default(tpm)

but if i copy over the code for average.mat.rows, get.abundant, cell.type and make.cell.type it runs fine!

Absolutely not an R expert, but could it be that something within DIALOGUE changes the transpose function to one that can't handle sparse matrices for some reason?

In case it helps, this is the code I am running:

cell_type <- "B"

adata_ct <- read_h5ad(file)

tpm <- t(adata_ct$X)  # dgRMatrix

pca <- adata_ct$obsm[['X_pca']]
rownames(pca) <- colnames(tpm)

samples <- adata_ct$obs$scRNASeq_sample_ID

make.cell.type(
    name = cell_type,
    tpm = tpm,
    samples = samples,
    X = pca,
    metadata = adata_ct$obs[c("Pool_ID")],
    cellQ = adata_ct$obs$QC_total_UMI,
)

Ok I think I understood it now. t() without loading the Matrix package can only handle dense matrices and is apparently the one that is used here:

tpmAv = t(average.mat.rows(t(tpm),samples)),

while if i load DIALOGUE, it also loads Matrix and when i then define the make.cell.type method again, it get's the fancy t() version that can handle my sparse matrix.
Any chance it could be arranged that this is incorporated here @livnatje ?

i think allowing for sparse tpm would also require changing

R$cca.gene.cor1[[x]]<-cor(t(r@tpm),r@scores)

to

cvals <- corSparse(t(r@tpm),r@scores)
rownames(cvals) <- rownames(r@tpm)
colnames(cvals) <- colnames(r@scores)
R$cca.gene.cor1[[x]]<-cvals

with corSparse from the qlcMatrix library (installed from github since not on CRAN anymore :/ but maybe there is another sparse cor that i didn't see)