for a while im trying to run this model and i'm getting this error

Question

for a while im trying to run this model and i'm getting this error

cehguru opened this issue a year ago · 2 comments

this is the block of code where i'm getting out of memory error
def batch_gd(model, criterion, train_loader, test_laoder, epochs):
train_losses = np.zeros(epochs)
test_losses = np.zeros(epochs)

for e in range(epochs):
    t0 = datetime.now()
    train_loss = []
    for inputs, targets in train_loader:
        inputs, targets = inputs.to(device), targets.to(device)

        optimizer.zero_grad()

        output = model(inputs)

        loss = criterion(output, targets)

        train_loss.append(loss.item())  # torch to numpy world

        loss.backward()
        optimizer.step()

    train_loss = np.mean(train_loss)

    validation_loss = []

    for inputs, targets in validation_loader:
        inputs, targets = inputs.to(device), targets.to(device)

        output = model(inputs)

        loss = criterion(output, targets)

        validation_loss.append(loss.item())  # torch to numpy world

    validation_loss = np.mean(validation_loss)

    train_losses[e] = train_loss
    validation_losses[e] = validation_loss

    dt = datetime.now() - t0

    print(
        f"Epoch : {e+1}/{epochs} Train_loss:{train_loss:.3f} Test_loss:{validation_loss:.3f} Duration:{dt}"
    )

return train_losses, validation_losses

batch_size = 64
train_loader = torch.utils.data.DataLoader(
dataset, batch_size=batch_size, sampler=train_sampler
)
test_loader = torch.utils.data.DataLoader(
dataset, batch_size=batch_size, sampler=test_sampler
)
validation_loader = torch.utils.data.DataLoader(
dataset, batch_size=batch_size, sampler=validation_sampler
)
train_losses, validation_losses = batch_gd(
model, criterion, train_loader, validation_loader, 5
)

Answer 1 · 2023-06-19T20:48:49.000Z

As the error suggests, the memory allocation goes out of available cuda memory, can you try using a reduced batch size ?
Ex: 32 or 16 ?

Answer 2 · 2023-06-20T05:17:20.000Z

Hello ThilinaRajapakse/simpletransformers I would recommend you to try the same in colab. Even if you reduce the batch size the output will be the same. This particular allocation error maybe an error in the code.

…

On Tue, Jun 20, 2023, 02:19 Damith Premasiri ***@***.***> wrote: As the error suggests, the memory allocation goes out of available cuda memory, can you try using a reduced batch size ? Ex: 32 or 16 ? — Reply to this email directly, view it on GitHub <#1519 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ALO52G3U5V4UMGFIDANGSZLXMC3LZANCNFSM6AAAAAAXE46DQQ> . You are receiving this because you authored the thread.Message ID: ***@***.***>