YingqingHe/LVDM

GPU memory usage during training

Opened this issue · 1 comments

wren93 commented

Hi, thank you for sharing this great work. I'm trying to train LVDM on ucf101 for unconditional generation and I observed a weird gpu memory usage during training. When I used batch size=2 I got from nvidia-smi that my gpu memory usage during training is roughly 73000mb. However, when I increased the batch size to 32 the memory usage went down to ~35000mb during training. I tried to debug the code and it seems that the UNet model is consuming a large amount of memory when the batch size is small (memory increases from 8g to ~73g before and after line 626 - line 634 in lvdm/models/modules/openaimodel3d.py). I wonder if you have any insights on this issue. Thanks!

wren93 commented

Also I'm wondering what are the usages of the batch size and num_workers parameters under the "trainer" attribute in the config file (as shown in the figure). It seems that the actual parameters to control the dataloader are under "data"?
1