z-fabian/HUMUS-Net

Specified GPU problem

Closed this issue · 5 comments

Hello, how to specify GPU training? I can't find the part of the code that needs to be changed. Thank you!

Hi,
Training and evaluation scripts will run by default on GPUs using ddp. You can set the number of GPUs via the --gpus flag when running the scripts.

Thank you! but If I have eight graphics cards : 0,1,2,3,4,5,6,7,8, the first two graphics cards : 0,1 cannot be used, and only the last six graphics cards can be used : 2,3,4,5,6,7, how do I do that, how do I specify the last six graphics cards?

Hi,

In this case, set --gpus to 6 and you have to set the environment variable to limit the GPUs visible to python. You can do this using

CUDA_VISIBLE_DEVICES=2,3,4,5,6,7 python3 humus_examples/train_humus_fastmri.py \
--config_file PATH_TO_CONFIG \
--data_path DATA_ROOT \
--default_root_dir LOG_DIR \
--gpus 6

or if you are using jupyter-notebook you can set

%env CUDA_VISIBLE_DEVICES=2,3,4,5,6,7

before running your code.

OK, Thank you! I will try. One last question, how long did it take you to run the code on the fastmri data set with 8 graphics cards at the same time? Because I spent a week running only 11 epochs with two graphics cards (3090)

It takes around 5-7 days on 8 GPUs to train the default size network on the training dataset. If you have extra GPU memory, you can speed up training by only checkpointing a subset of the cascades here.