bentrevett/pytorch-sentiment-analysis

Question about computing loss

TalitaGroetzinger opened this issue · 3 comments

Hi!

First of all thank you so much for these awesome tutorials. They are so useful :).

So far I didn't encounter any issues with the notebooks but I just have a question about how you compute the loss (f.i., in the sentiment analysis notebook and the upgraded sentiment analysis notebook). I have seen examples in books or on the internet where the loss is computed as follows:

   training_loss = 0.0
   for batch_index, batch in enumerate(train_iterator):
            optimizer.zero_grad()
            predict = model(batch.text)
            loss = criterion(predict, batch.label)
            loss.backward()
            optimizer.step()
            training_loss += loss.data.item() * batch.batch_size
    training_loss /= len(train_iterator)

So they multiply the loss by the batch_size. As far as I know, you don't do that in your examples, and it's very difficult for me to understand which solution is better, since I am a newbie in pytorch and DL. Could you perhaps explain to me why it's better to not multiply the loss by the batch_size?

Glad you found the tutorials useful!

The code you provided still calculates the loss for the weight updates in the exact same way. It only changes the loss value returned by the train and evaluate functions which we only use for printing to the user.

The way it is done in the tutorials is the train and evaluate functions return the average loss per batch, whereas the code you have provided returns the average loss per example. I think this is just a matter of preference and only really matters if you are very specifically concerned with the actual values of loss. In these tutorials, we are more concerned with the accuracy values as they are easier to interpret.

For example if we got a loss of 0.5 (with loss per batch) or 16 (with loss per example and batch size of 32), how can we tell if these are good or not? Obviously we know lower loss is better, but is a loss of 0.5 or 16 good enough for deployment into the real world? It's difficult to say, hence we use other metrics (such as accuracy) to give us a value we can actually understand.

Hope this makes sense, let me know if anything needs clarifying.

Also, I have always thought the default should be loss per batch. See: https://github.com/pytorch/examples/blob/master/mnist/main.py

Thank you so much for your reply! Yes, this definitely makes sense!