bentrevett/pytorch-sentiment-analysis

About semi-supervised sentiment analysis

zewei-long opened this issue · 2 comments

hi, @bentrevett!
I hope to apply a semi-supervised learning method into IMDB dataset, so I am trying to use your code (upgraded sentiment analysis), It is fantastic. However, I don't know how to replace some previous ALL labeled data to unlabeled data. And I don't know how to ignore the unlabled data when calculate loss (it seems that BCEWithLogitsLoss() can not ignore -1 like crossentropyloss() does), I really hope you can help me!

One solution is to re-write a bit of the tutorial to use CrossEntropyLoss so you can use -1 to ignore some examples.

You'll need to:

  • change the LABEL = data.LabelField(dtype = torch.float) field to LABEL = data.LabelField(), i.e. get rid of the cast to float
  • change OUTPUT_DIM to 2
  • change criterion = nn.BCEWithLogitsLoss() to criterion = nn.CrossEntropyLoss().
  • replace the binary_accuracy function to:
def categorical_accuracy(preds, y):
    """
    Returns accuracy per batch, i.e. if you get 8/10 right, this returns 0.8, NOT 8
    """
    max_preds = preds.argmax(dim = 1, keepdim = True) # get the index of the max probability
    correct = max_preds.squeeze(1).eq(y)
    return correct.sum() / torch.FloatTensor([y.shape[0]])
  • change all calls to binary_accuracy to categorical_accuracy.

I think that is all you need to do - might be forgetting something to let me know if that works for you.

Thanks! it works.