Specified GPU problem

Question

Specified GPU problem

Closed this issue a year ago · 5 comments

Hello, how to specify GPU training? I can't find the part of the code that needs to be changed. Thank you!

Answer 1 · 2023-01-13T23:20:42.000Z

Hi,
Training and evaluation scripts will run by default on GPUs using ddp. You can set the number of GPUs via the --gpus flag when running the scripts.

Answer 2 · 2023-02-07T05:04:53.000Z

Thank you! but If I have eight graphics cards : 0,1,2,3,4,5,6,7,8, the first two graphics cards : 0,1 cannot be used, and only the last six graphics cards can be used : 2,3,4,5,6,7, how do I do that, how do I specify the last six graphics cards?

Answer 3 · 2023-02-07T18:46:52.000Z

Hi,

In this case, set --gpus to 6 and you have to set the environment variable to limit the GPUs visible to python. You can do this using

CUDA_VISIBLE_DEVICES=2,3,4,5,6,7 python3 humus_examples/train_humus_fastmri.py \
--config_file PATH_TO_CONFIG \
--data_path DATA_ROOT \
--default_root_dir LOG_DIR \
--gpus 6

or if you are using jupyter-notebook you can set

%env CUDA_VISIBLE_DEVICES=2,3,4,5,6,7

before running your code.

Answer 4 · 2023-02-10T07:44:07.000Z

OK, Thank you! I will try. One last question, how long did it take you to run the code on the fastmri data set with 8 graphics cards at the same time? Because I spent a week running only 11 epochs with two graphics cards (3090)

Answer 5 · 2023-02-10T17:52:32.000Z

It takes around 5-7 days on 8 GPUs to train the default size network on the training dataset. If you have extra GPU memory, you can speed up training by only checkpointing a subset of the cascades here.