martinjzhang/scDRS

Preprocessing: killed running scdrs compute-score

schroeme opened this issue · 4 comments

Hello, I'm running scdrs compute-score per the instructions here, and I get the following error:

******************************************************************************
* Single-cell disease relevance score (scDRS)
* Version 1.0.2
* Martin Jinye Zhang and Kangcheng Hou
* HSPH / Broad Institute / UCLA
* MIT License
******************************************************************************
Call: scdrs compute-score \
--h5ad-file adata.h5ad \
--h5ad-species human \
--cov-file None \
--gs-file magma_scz_top1000_zscore.gs \
--gs-species human \
--ctrl-match-opt mean_var \
--weight-opt vs \
--adj-prop None \
--flag-filter-data True \
--flag-raw-count False \
--n-ctrl 1000 \
--flag-return-ctrl-raw-score False \
--flag-return-ctrl-norm-score True \
--out-folder marm_out

Loading data:
--h5ad-file loaded: n_cell=881832, n_gene=22582 (sys_time=218.2s)
First 3 cells: ['ATCTTCACAAGGCTTT-1', 'TCAAGCACATACTGAC-1', 'CAACCTCGTCCTACAA-1']
First 5 genes: ['LOC118152095', 'SLITRK6', 'LOC118152108', 'LOC103791423', 'SLITRK5']
--gs-file loaded: n_trait=1 (sys_time=218.2s)
Print info for first 3 traits:
First 3 elements for 'SCZ': ['DNAH10', 'DDX55', 'SNRNP35'], [8.2586, 8.0577, 7.845]

Preprocessing:
Killed

My data is already log1p-transformed on normalized counts. Any idea what might be causing this error? Is the file size too large, or am I violiting any input requirements (I couldn't find any)?

Thanks!

Hi,

The most likely reason is your file is too large. I once applied scDRS to a dataset with 500K and it needed 96G of memory. I suggest:

  • At least allocate 128G of memory and check if it solves the problem.
  • If you have limited computer memory, consider randomly split the dataset into smaller chunks and later marge the results.

Hi @martinjzhang, thanks so much for the quick response! I have 256G RAM on my computer, so I should be able to allocate 128G easily. Is the memory allocation something I can change when calling compute-score? If not, how can I change it? Thanks!

Hi,

Allocating RAM to software should be done in the OS instead of within scDRS. Can you check to see if your system already allocate all memories to this program or it sets an upper limit?

Also, did you store your single-cell data in sparse format? Specifically, is adata.X a sparse matrix? If not, converting it to a sparse matrix will further save memories.

Sparsifying adata.X (converting to raw counts) worked! Thank you!