AdaptiveMotorControlLab/CellSeg3D

How to engage GPU for WNET3D Training

Closed this issue · 2 comments

We tried to run the WNet3D training on our dataset but received the following error "Training failed with exception: [enforce fail at alloc_cpu.cpp:114] data. DefaultCPUAllocator: not enough memory: you tried to allocate 4294967296 bytes" We were able to confirm this was a result of the model utilizing the CPU as opposed to our GPU, most likely a result of a failure on our part to download the correct version of pytorch, would fixing the pytorch version fix the GPU engagement issue, and would the CUDA 11.8 pytorch version be preferred for a NVIDIA RTX 4500 Graphics Card, or are there any further steps we need to take, any help would be greatly appreciated!

Hello, thanks for trying out our plugin and models !

This indeed looks like a possible problem with torch and CUDA versions, as the GPU should be selected by default for training if available.

  • You are right, installing pytorch with a CUDA version that matches your GPU's compute capability could fix the issue. For the RTX A4500 a CUDA version up to 12.x should be fine (see here).
  • Another thing to keep in mind is to have compatible drivers for your GPU
  • Additionally, it seems training is trying to allocate 4TB of RAM/VRAM; are you running the training on entire images ? I would suggest splitting your data into cubes of 64 voxels length for WNet training ideally, as that number seems a bit high. You may want to check the utilities menu if you need to quickly fragment a large volume into cubes : see this documentation page for more info

I hope this helps ! Please let me know if I can help with the steps above, or if you have any further issues with the plugin.

Best,

Cyril


Side note, you might know already but if you need to check for GPU availability after reinstalling torch :

  1. You can see a list of available devices in the plugin when running training, under "Device"
    image

  2. Otherwise you can quickly check by running
    ipython
    and then
    import torch
    torch.cuda.is_available()

Hello @c3rry , were you able to fix your GPU issues ? I will close this for now, but please reopen the issue if you still need help.

Best,¨
Cyril