ilivans/tf-rnn-attention

Finding actual length of sequence

snehasinghania opened this issue · 5 comments

Why is there a +1 at line 114: seq_len = np.array([list(x).index(0) + 1 for x in x_batch]) # actual lengths of sequences and line 129 in train.py? Won't list(x).index(0) without the +1 be enough?

With that +1 we add token (which has index 0) to the passed sequence, so this token will be processed by RNN cell at the end of a sequence, we explicitly say to the cell that sequence is over and it may change its state somehow according to this information.

Since we are specifying the length of the sequence, won't the RNN cell know the end of the sequence? What sort of state change will occur if we explicitly provide 0 at the end? And what will this state represent intuitively? Please explain.

The state will represent the whole completed sequence. Intuitively when you don't provide EOS token RNN cell may be waiting for some new information, thus holding some intermediate state. For example let's consider a sequence "I saw him". Imagine you fed only these 3 tokens to a cell, but the cell may be waiting for the next token to be something like "running". To help a cell to understand that it doesn't need to wait for anything else and it can change its state somehow using EOS token embedding we pass the token explicitly. This is the intuition I have.

Great Post Ilivans.
I just wonder why we have to add token (which has index 0) to the passed sequence.
If some sentences don't have 0 in the end of sequence,
what would have happened passing into the model.
Thanks

@bgg11117
I think this question is out of the repo's scope, but I'll try to explain.
Adding EOS token at the end of a sequence is a common pre-processing step which is applied to all the incoming examples. Thus model conceptually won't see sentences without this token.
But you can intentionally skip pre-processing step for some of the input examples during training or testing, I guess that's what you ask. I suppose model will be confused if it learned how to use that token successfully already, in the other case it won't make much a difference. But anyway skipping pre-processing procedures just doesn't make sense, because you confuse the model on purpose introducing some kind of a noise.