scverse/scvi-tools

scvi tools train with multi gpus

shanzha9 opened this issue · 3 comments

Hi, developers,

The datasets used in my study consisting of 2.6 M cells, so it will take a week to train. I wonder if scvi tools support multi gpus train and if there is an official tutorial.

Thank you!

There is multi-GPU support. However, your calculation looks vastly off and I don't think your dataset is large enough to benefit largely from multi-GPU. E.g. on the human CELLXGENE census dataset of 35 million cells, training for 100 epochs took less than 2 days (we actually had strikingly similar results after a couple of hours and 20 epochs). I would recommend increasing batch_size to 1024 (time scales pretty linearly with batch size) and reducing train epochs to 50 (it's what I am usually using) and you should be able to train it in 2 hours. Let me know if it takes more than 8 hours (installation might be wrong or your object is not correctly formatted).

@canergen Hi, thanks for your reply.

Could you please link the official multi gpus tutorial. The data to be train is raw counts. I set

scvi.settings.dl_num_workers = 30 scvi.settings.num_threads = 30 scvi.settings.batch_size = 2048

but, the %cpu only under 20% per process. There may somethingwrong with dataloader?

Low CPU load is usually not a problem if you reach high GPU usage. It just tells you, that it's less work and doesn't require 30 workers. The copy speed to GPU is the limiting step and doesn't benefit much from more workers. Multi-GPU support is currently experimental and we will release a tutorial with scvi-tools 1.3 (planned for end of this year).