bentrevett/pytorch-sentiment-analysis

Updated Sentiment Analysis : what's the impact of not using packed_padded_sequence()?

githubrandomuser2017 opened this issue · 1 comments

Thanks for your awesome tutorials. In the one for "Updated Sentiment Analysis", you wrote the following:

Without packed padded sequences, hidden and cell are tensors from the last element in the sequence, which will most probably be a pad token, however when using packed padded sequences they are both from the last non-padded element in the sequence.

What does this mean exactly? If I'm using an LSTM, the final hidden state is an ongoing representation of the sequence up to and including the last token. If the last few tokens are <PAD>, would that matter since the hidden state already captured the previous non-<PAD> tokens?

In theory, it wouldn't matter as your RNN should learn to ignore the pad tokens and not update its internal hidden state if it sees a <pad> token. However, your RNN has to explicitly learn that. It starts off with no prior knowledge that <pad> tokens do not contain any information. Thus, by using packed padded sequences we avoid that altogether. Your model doesn't have to learn to ignore <pad> tokens as it never sees them in the first place.