Train on multiple GPUs with nn.DataParallel()

Question

Train on multiple GPUs with nn.DataParallel()

delitroz opened this issue 3 years ago · 1 comments

Hello,

I have issue trying to train an INN on multiple GPUs using nn.DataParallel().
I recreated a minimal example of the issue from the example at the beginning of the ReadMe. When initializing the inn using Ff.SequenceINN() (commented in the code below), all goes as intended and I can see all of my GPUs being loaded. But when using Ff.ReversibleGraphNet(), I get an error saying that there is a mismatch between several tensors' devices (note: this initialization works fine on a single GPU).
Since I am planning to implement a convolutional conditional INN I will need something more expressive than just a SequenceINN() that supports splitting.

Does anyone had success training such model on multiple GPUs?
Thanks.

import torch
import torch.nn as nn
from sklearn.datasets import make_moons

import FrEIA.framework as Ff
import FrEIA.modules as Fm

from tqdm import trange

BATCHSIZE = 1000
N_DIM = 2

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

def subnet_fc(dims_in, dims_out):
    return nn.Sequential(nn.Linear(dims_in, 512), nn.ReLU(),
                         nn.Linear(512,  dims_out))

# inn = Ff.SequenceINN(N_DIM)
# for k in range(8):
#     inn.append(Fm.AllInOneBlock, subnet_constructor=subnet_fc, permute_soft=True)

nodes = []
nodes.append(Ff.InputNode(2))
for k in range(8):
    nodes.append(Ff.Node(nodes[-1],
                Fm.AllInOneBlock,
                {'subnet_constructor':subnet_fc, 'permute_soft':True}))
nodes.append(Ff.OutputNode(nodes[-1]))
inn = Ff.ReversibleGraphNet(nodes)

inn = nn.DataParallel(inn)
inn = inn.to(device)

optimizer = torch.optim.Adam(inn.parameters(), lr=0.001)

for i in trange(1000):
    optimizer.zero_grad()
    data, label = make_moons(n_samples=BATCHSIZE, noise=0.05)
    x = torch.Tensor(data).to(device)
    z, log_jac_det = inn(x)
    loss = 0.5*torch.sum(z**2, 1) - log_jac_det
    loss = loss.mean() / N_DIM
    loss.backward()
    optimizer.step()

z = torch.randn(100, N_DIM).to(device)
samples, _ = inn(z, rev=True)

Answer 1 · 2022-02-23T11:18:11.000Z

did you manage to find a solution?