[Finetuning OneFormer] How to use multiple GPUs
EricLe-dev opened this issue · 0 comments
EricLe-dev commented
Dear @NielsRogge. First and foremost, thank you so much for your fantastic works. I did follow your tutorial and was able to finetune OneFormer. However, when I try to finetune the model on multi GPUs, it did not work.
I did two approaches:
1. Using DataParallel
import torch.nn as nn
# some code the same as your tutorial
processor.image_processor.num_text = model.config.num_queries - model.config.text_encoder_n_ctx
train_dataset = CustomDataset(processor)
train_dataloader = DataLoader(train_dataset, batch_size=1, shuffle=True, num_workers=16)
optimizer = AdamW(model.parameters(), lr=5e-5)
model = nn.DataParallel(model)
device = 'cuda'
model.to(device)
model.train()
for epoch in range(20): # loop over the dataset multiple times
for batch in train_dataloader:
# zero the parameter gradients
optimizer.zero_grad()
batch = {k:v.to(device) for k,v in batch.items()}
# forward pass
outputs = model(**batch)
# backward pass + optimize
loss = outputs.loss
print("Loss:", loss.item())
loss.backward()
optimizer.step()
This code running normally but just only GPU:0 was utilized, the other GPUs do not seems to work.
Here is the result from nvidia-smi while it's running:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.239.06 Driver Version: 470.239.06 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:3B:00.0 Off | N/A |
| 55% 58C P2 196W / 356W | 20651MiB / 24268MiB | 71% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 NVIDIA GeForce ... Off | 00000000:3C:00.0 Off | N/A |
| 59% 57C P2 121W / 356W | 8MiB / 24268MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 2 NVIDIA GeForce ... Off | 00000000:5E:00.0 Off | N/A |
| 53% 54C P2 120W / 356W | 8MiB / 24268MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 3 NVIDIA GeForce ... Off | 00000000:86:00.0 Off | N/A |
| 53% 47C P2 118W / 356W | 8MiB / 24268MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 4 NVIDIA GeForce ... Off | 00000000:D8:00.0 Off | N/A |
| 60% 58C P2 137W / 356W | 8MiB / 24268MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 5 NVIDIA GeForce ... Off | 00000000:D9:00.0 Off | N/A |
| 60% 58C P2 111W / 356W | 8MiB / 24268MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 2170 G /usr/lib/xorg/Xorg 4MiB |
| 0 N/A N/A 2809467 C python 20643MiB |
| 1 N/A N/A 2170 G /usr/lib/xorg/Xorg 4MiB |
| 2 N/A N/A 2170 G /usr/lib/xorg/Xorg 4MiB |
| 3 N/A N/A 2170 G /usr/lib/xorg/Xorg 4MiB |
| 4 N/A N/A 2170 G /usr/lib/xorg/Xorg 4MiB |
| 5 N/A N/A 2170 G /usr/lib/xorg/Xorg 4MiB |
+-----------------------------------------------------------------------------+
2. Using Accelerate
Following this tutorial, I modified the code as following:
processor.image_processor.num_text = model.config.num_queries - model.config.text_encoder_n_ctx
train_dataset = CustomDataset(processor)
# val_dataset = CustomDataset(processor)
train_dataloader = DataLoader(train_dataset, batch_size=1, shuffle=True, num_workers=16)
optimizer = AdamW(model.parameters(), lr=5e-5)
accelerator = Accelerator()
model, optimizer, train_dataloader = accelerator.prepare(model, optimizer, train_dataloader)
model.train()
for epoch in range(20): # loop over the dataset multiple times
for batch in train_dataloader:
# zero the parameter gradients
optimizer.zero_grad()
# batch = {k:v.to(device) for k,v in batch.items()}
# forward pass
outputs = model(**batch)
# backward pass + optimize
loss = outputs.loss
print("Loss:", loss.item())
accelerator.backward(loss)
optimizer.step()
This code was running normally, except only GPU:0 works.
I'm quite sure that I'm missing something here. Can you please point me to the right direction? Thank you so much!