Using data loader for performing tensor Factorization
Closed this issue · 9 comments
I'm a new user of LIANA+ and currently working with a large dataset that includes approximately 300 samples (around 1.5 million cells). I have access to 4 GPUs, each with 48 GB of memory. I’m considering using a data distributor to perform tensor factorization(c2c.analysis.run_tensor_cell2cell_pipeline) and would like to distribute the workload across all available GPUs.
Could you let me know if it's possible to utilize a data distributor in this context to fully leverage all the GPU resources?
Additionally, we have 112 CPU cores available. Is it feasible to execute this tensor factorization process on the CPUs instead of GPUs using the full set of cores?
Furthermore, I am interested in plotting the distribution of scores for each of the methods for the entire dataset, not just for a subset or a single sample. Is it possible to utilize all available CPU cores (112 cores) on our server to achieve this?
liana_res = adata.uns["liana_res"]
liana_res = liana_res.loc[:, liana_res.columns[~liana_res.columns.str.contains(pat = 'rank')]]
liana_res = liana_res.melt(id_vars=['source', 'target', 'ligand_complex', 'receptor_complex'], var_name='score', value_name='value')
liana_res['score'] = liana_res['score'].astype('category
Thank you!
Hi @Mona1982,
I believe a single GPU w 48GB memory should be sufficient given that the step with tensor-cell2cell is done after the data are aggregated by sample.
So, the resulting tensor is of dimensions: n samples, n cell types, n cell types, n interactions - just so that you get an idea of its size. Thus, 48GB should be perfectly sufficient.
Then the factorisation would summarize patterns across all cell types and samples, so it should be straightforward to plot following that step.
Here you can find extensive tutorials:
https://ccc-protocols.readthedocs.io/en/latest/
And instructions how to set up a gpu env:
https://github.com/saezlab/ccc_protocols/tree/main/env_setup
Hi @dbdimitrov , thanks for the reply. However, whenever I run the following code, I get an error regarding out of memory, and the process has been stopped.
tensor.shape
(299, 3712, 22, 22)
tensor=c2c.analysis.run_tensor_cell2cell_pipeline(tensor,
meta_tensor,
copy_tensor=True,
rank=None,
tf_optimization='regular',
random_state=0,
elbow_metric='error',
smooth_elbow=False,
cmaps=None,
device= 'cpu',
output_fig=True,
output_folder = Results_folder
)
Hmm this is a bit odd. Can you check that cuda is available in your env, and also that you request enough video memory via the nvidia-smi log. EDIT: https://enterprise-support.nvidia.com/s/article/Useful-nvidia-smi-Queries-2 you can also check the free memory alone w memory.free
If this looks good and somehow indeed you'd need to use multiple GPUs. Then we would need to check of tensorly supports parallel pytorch GPU processes.
@earmingol any experience on running 300 samples w tensor?
I have already checked all items. Also, I have created a subset that contains only 100 samples from my adata, and for this subset, the CUDA was okay, and I did not have any problems with it. So, I thought that if I could use the data distributor, I could perform tensor factorization based on samples for different resources.
Hi @Mona1982,
Great. Apologies for the basic suggestions, users with different backgrounds use liana so I have to double check.
I'm currently on patternity leave and won't get a chance to test tensor-c2c on a multi-gpu session any time soon.
From what I saw from issues tensorly should support multi GPUs and hence so should tensor-c2c. The only question is whether the way parafac is called is compatible to support multi GPU. Perhaps, @earmingol or @hmbaghdassarian can let you know if they get a chance.
Alternatively, you could try a similar approach using MOFA+ inspired by tensor-cell2cell which uses mofa instead of non-negative parafac, this one should also be runnable on CPU:
https://liana-py.readthedocs.io/en/latest/notebooks/mofatalk.html
Sorry the delay, I have been swamped with other things recently.
Unfortunately data loaders would not work here because tensor factorization doesn't work with batches as neural networks do. Our method is based on https://github.com/tensorly/tensorly, which performs the factorization with all elements simultaneously. Also I am not sure if there is a tensor factorization algorithm able to handle batches for memory efficiency.
Regarding your question of using multiple GPUs, just using device="cuda"
should be enough to call all GPUs together. However, I am not sure if they sum up all memory as just one big chunk of memory, or just rely on the max memory available across all GPUs. I haven't experimented with more than 1 GPU.
Considering the size of your tensor, I would prioritize LR pairs to a more manageable number (maybe ~ 500?)
Hi, thanks for your complete explanation about this. However, I have a new question: can I use external resources for LIANA+? I am interested in using Neuronchat as a resource for LIANA, but I don't know if I could use it like the other resources that have been mentioned in LIANA or not. Thanks for your help.
Hi @Mona1982,
If you are interested in metabolite-mediated CCC, you could follow this tutorial:
https://liana-py.readthedocs.io/en/latest/notebooks/sc_multi.html#Metabolite-mediated-CCC-from-Transcriptomics-Data
You could also pass Neuronchat as the sole resource to liana, though there might be some differences in how they estimate the presence of metabolites.
Hope this helps.