About semi-supervised sentiment analysis

Question

About semi-supervised sentiment analysis

zewei-long opened this issue 4 years ago · 2 comments

hi, @bentrevett!
I hope to apply a semi-supervised learning method into IMDB dataset, so I am trying to use your code (upgraded sentiment analysis), It is fantastic. However, I don't know how to replace some previous ALL labeled data to unlabeled data. And I don't know how to ignore the unlabled data when calculate loss (it seems that BCEWithLogitsLoss() can not ignore -1 like crossentropyloss() does), I really hope you can help me!

Answer 1 · 2020-06-08T18:10:24.000Z

One solution is to re-write a bit of the tutorial to use CrossEntropyLoss so you can use -1 to ignore some examples.

You'll need to:

change the LABEL = data.LabelField(dtype = torch.float) field to LABEL = data.LabelField(), i.e. get rid of the cast to float
change OUTPUT_DIM to 2
change criterion = nn.BCEWithLogitsLoss() to criterion = nn.CrossEntropyLoss().
replace the binary_accuracy function to:

def categorical_accuracy(preds, y):
    """
    Returns accuracy per batch, i.e. if you get 8/10 right, this returns 0.8, NOT 8
    """
    max_preds = preds.argmax(dim = 1, keepdim = True) # get the index of the max probability
    correct = max_preds.squeeze(1).eq(y)
    return correct.sum() / torch.FloatTensor([y.shape[0]])

change all calls to binary_accuracy to categorical_accuracy.

I think that is all you need to do - might be forgetting something to let me know if that works for you.

Answer 2 · 2020-06-09T09:04:21.000Z

Thanks! it works.