Performance issues on a large (120724) location dataset
Dillon214 opened this issue · 4 comments
Hello cell2location devs,
I am working with a large visium dataset (12240 features, 120724 locations), and am running into issues when it comes to training the cell2location.models.Cell2location model. The specific issue is very simple, it's just too slow. I was getting a speed of about 10 seconds per iteration, and across 30000 iterations I was hitting estamated completion times of about 80 hours. I am running in GPU mode, and throwing additional a40 cores at the issue didn't seem to improve speed, I was using 8 with my latest run. I also tried batching the data so each batch was about 30,000 cells, which also didn't improve speed.
Do you have any advice for me? I would be greatly appreciative, as I have used this package before on a smaller dataset to great success, and it was done training in only a few hours.
sincerely,
Dillon Brownell
Hi @Dillon214
Exciting data you have. Please have a look at this issue for practical suggestions about working with large data #356
We find the best performance when training cell2location with batch size equal to full data. This limits batch size by GPU memory. Roughly 18k genes * 60k locations for 80GB A100.
Are you using batch_size=None
?
getting a speed of about 10 seconds per iteration
This is very slow. Could you confirm that the GPU is used?
throwing additional a40 cores at the issue didn't seem to improve speed, I was using 8 with my latest run
Cell2location doesn't support using multiple GPUs - only one GPU was used.
it was done training in only a few hours
This sounds expected.
Hi Viktl,
Thanks for the speedy reply. And no, I was adjusting the batch_size argument to avoid out of memory errors. I'm no expert, but I'm guessing the size of the dataset exceeds the memory capacity of a single GPU. And yes, I can confirm that a GPU is being used. See attached image.
Based on the suggestions in the issue you posted, it seems like splitting the dataset into chunks, stratified by batch and other relevant variables perhaps, is a good idea for evading slowdown issues. Do you still reccomend this? I tested this a bit, and it seemed to speed up processing dramatically. I suppose afterwards the results can be re-merged.
-Dillon
Using batch_size
other than batch_size=None
is indeed expected to take between many days and more than a week (10 seconds per epoch is pretty fast). More importantly, training with batch_size
equal to full data gives higher accuracy - so we don't recommend using minibatch training.
it seems like splitting the dataset into chunks, stratified by batch and other relevant variables perhaps, is a good idea for evading slowdown issues. Do you still reccomend this?
Yes, but you also need to set batch_size=None
.
With batch_size=None
the entire dataset is loaded into GPU memory once, while with minibatch training batch_size=X
X number of observations are loaded into GPU memory in every training step.
Hi Vitkl,
I followed the code you posted in that other thread for subsetting and individually processing multiple objects, and experienced much better training times.
One final question: say I want to now compute expected expression per cell type, as was done in the tutorial pictured below and at the following link: https://cell2location.readthedocs.io/en/latest/notebooks/cell2location_tutorial.html#Estimate-cell-type-specific-expression-of-every-gene-in-the-spatial-data-(needed-for-NCEM).
Would it be fine to perform this process for each individually processed object, then merge the results, or do you think a different approach should be taken?
-Dillon