Is Multi-GPU config enabled even when I'm using one GPU?
BlaiseMuhirwa opened this issue · 1 comments
BlaiseMuhirwa commented
I fired up a single A100 SXM and I was going through the steps for training GPT-2 per this doc. When I run the following command
# train on a single GPU
./train_gpt2cu \
-i "dev/data/fineweb10B/fineweb_train_*.bin" \
-j "dev/data/fineweb10B/fineweb_val_*.bin" \
-o log124M \
-e "d12" \
-b 64 -t 1024 \
-d 524288 \
-r 1 \
-z 1 \
-c 0.1 \
-l 0.0006 \
-q 0.0 \
-u 700 \
-n 5000 \
-v 250 -s 20000 \
-h 1
I get an error telling me that I should enable MPI.
MPI support is disabled. Please enable MPI support to use MPI-based NCCL-init method.
I think we should raise this error only in case the user is actually trying to use more than one GPU.
BlaiseMuhirwa commented
Ah, never mind. I saw that we can set the NO_MULTI_GPU
to disable this!