falloutdurham/beginners-pytorch-deep-learning

chapter 2 - training loss differet in github and in the book

g-i-o-r-g-i-o opened this issue · 1 comments

Hello, about the code in chapter 2, in the training section, I wonder why the loss calculation is different from that on the book.

def train(model, optimizer, loss_fn, train_loader, val_loader, epochs=5, device="cpu"):
    for epoch in range(1, epochs+1):

[...]

from github:

        training_loss += loss.data.item() * inputs.size(0)
training_loss /= len(train_loader.dataset)

Here you multiply the loss.data.item() * input.size(0), which is the number of elements in the batch input (64 elements). That's because by default, "the losses are averaged over each loss element in the batch.", so you calculate a cumulative loss for the entire batch.
Then you divide the training_loss by the total number of the elments in the dataset (about 800 elements). So this is a medium loss for each element.

from the book:

        training_loss += loss.data.item()
training_loss /= len(train_iterator)

The book code seems wrong in this case, and github code is right. Because in the last row you had to divide by the length of the batch?

[...]

from github:

            valid_loss += loss.data.item() * inputs.size(0)
            correct = torch.eq(torch.max(F.softmax(output, dim=1), dim=1)[1], targets)
            num_correct += torch.sum(correct).item()
            num_examples += correct.shape[0]
        valid_loss /= len(val_loader.dataset)

from the book:

            valid_loss += loss.data.item()
            correct = torch.eq(torch.max(F.softmax(output), dim=1)[1],
            target).view(-1)
            num_correct += torch.sum(correct).item()
            num_examples += correct.shape[0]
            valid_loss /= len(valid_iterator)

Yes, there have been a few typos in the book release which have corrected meanwhile, e.g. train_iterator not defined. Additionaly the PyTorch API has had some changed which were leading to deprecation messages arising from missing dimension arguments in torch.max and F.softmax (df4e02b#diff-1abbb440e65745ab4ef4f92e32643594fc9af5130d03dfc6879613047f083bd9)
And we noticed that .view(-1) was redundant, so we removed it.

"The book code seems wrong in this case, and github code is right. Because in the last row you had to divide by the length of the batch?"

Yes, the github code is working. In general you are right about the part on input.size(0) multiplication (see e.g. here https://stackoverflow.com/questions/61092523/what-is-running-loss-in-pytorch-and-how-is-it-calculated) and on averaging the loss (but in the last row you divide by the length of the whole dataloader [i.e. no. items in dataset], not the length of the batch, as you were summing up the cumulative loss for all batches before). The same holds for the val_loader.