About semi-supervised sentiment analysis
zewei-long opened this issue · 2 comments
zewei-long commented
hi, @bentrevett!
I hope to apply a semi-supervised learning method into IMDB dataset, so I am trying to use your code (upgraded sentiment analysis), It is fantastic. However, I don't know how to replace some previous ALL labeled data to unlabeled data. And I don't know how to ignore the unlabled data when calculate loss (it seems that BCEWithLogitsLoss() can not ignore -1 like crossentropyloss() does), I really hope you can help me!
bentrevett commented
One solution is to re-write a bit of the tutorial to use CrossEntropyLoss
so you can use -1 to ignore some examples.
You'll need to:
- change the
LABEL = data.LabelField(dtype = torch.float)
field toLABEL = data.LabelField()
, i.e. get rid of the cast to float - change
OUTPUT_DIM
to2
- change
criterion = nn.BCEWithLogitsLoss()
tocriterion = nn.CrossEntropyLoss()
. - replace the
binary_accuracy
function to:
def categorical_accuracy(preds, y):
"""
Returns accuracy per batch, i.e. if you get 8/10 right, this returns 0.8, NOT 8
"""
max_preds = preds.argmax(dim = 1, keepdim = True) # get the index of the max probability
correct = max_preds.squeeze(1).eq(y)
return correct.sum() / torch.FloatTensor([y.shape[0]])
- change all calls to
binary_accuracy
tocategorical_accuracy
.
I think that is all you need to do - might be forgetting something to let me know if that works for you.
zewei-long commented
Thanks! it works.