Bidirectional GRU Bugs

I'm currently porting your code over to run a comparison against it. I came across the following issue in model.py:

Lines 269 to 278 in 19071a8

    
           packed = pack_padded_sequence(wemb_out, lengths, batch_first=True) 
        
           if torch.cuda.device_count() > 1: 
        
             self.rnn.flatten_parameters() 
        
           rnn_out, _ = self.rnn(packed) 
        
           padded = pad_packed_sequence(rnn_out, batch_first=True) 
        
           # Reshape *final* output to (batch_size, hidden_size) 
        
           I = lengths.expand(self.embed_size, 1, -1).permute(2, 1, 0) - 1 
        
           out = torch.gather(padded[0], 1, I).squeeze(1) 
        
           out = self.dropout(out)

Let's modify the code above to preserve the final hidden state of the RNN.

>>> rnn_out, h_n = self.rnn(packed)

Now, let's inspect. (Note, the batch size may differ in your case, as well as the hidden state size from my implementation):

>>> h_n.shape
torch.Size([2, 32, 256])
>>> padded[0].shape
torch.Size([32, 15, 512])

So, we see h_n, which is the hidden state after consuming the final input, has 2 in its first dimension, since we have a bidirectional GRU. We also observe (in my case), the hidden state size is 256, while the output is size 512 (this is because Pytorch concats the two BiGRU (forward-backward) states in the output).

Let's examine the 5th element in the batch. First, we find its length:

>>> padded[1][5]
tensor(11)

So, we know the final hidden state occurs at index 10 (and the rest are padded). We can confirm here:

>>> torch.equal(padded[0][5, 10, :256], h_n[0, 5, :])
True

Here is where the problem comes:

>>> torch.equal(padded[0][5, 10, 256:], h_n[1, 5, :])
False

In other words, what we find is that the final hidden state for the reverse direction h_n[1, 5, :], does not equal the hidden state at padded[0][5, 10, 256:].

Similarly, we can verify this in your out variable:

>>> out.shape
torch.Size([32, 512])
>>> torch.equal(out[5,:256], h_n[0, 5, :])
True
>>> torch.equal(out[5,256:], h_n[1, 5, :])
False

In other words, the first part of out matches h_n, but the second half does not.

This is because the final hidden state for the reverse direction of the GRU is not at padded[0][5, 10, 256:]. Because it is reversed, it is actually located at index 0 in padded, not at index 10!

>>> torch.equal(padded[0][5, 10, 256:], h_n[1, 5, :])
False
>>> torch.equal(padded[0][5, 0, 256:], h_n[1, 5, :])
True

In other words, I believe your implementation doesn't use the correct hidden state in the reverse direction. If your reverse GRU is processing sequence S_n, ... S_1 and producing hidden states h_1, ... h_n, your implementation only processes the last item of the sequence S_n, and ignores the rest of the sequence (in the reverse direction). The forward direction's output hidden state captures the entire input, but the reverse state hidden state only captures the last item of the input, and none of the prior input states. So, the two hidden states used aren't even capturing the same sequence.

Long story short, I think you want torch.equal(out[5,256:], h_n[1, 5, :]) to be equal, but it isn't currently.

I believe the issue is also present in VideoEncoder, here:

pvse/model.py

Lines 197 to 199 in 19071a8

    
           states, _ = self.rnn(features) 
        
           states = self.dropout(states) 
        
           out = states[:, -1, :]

Again, it seems you are using the last index, for both directions of the GRU. This is correct for the forward case, but not backward.

Thanks for reporting it and providing a fix.

	packed = pack_padded_sequence(wemb_out, lengths, batch_first=True)
	if torch.cuda.device_count() > 1:
	self.rnn.flatten_parameters()
	rnn_out, _ = self.rnn(packed)
	padded = pad_packed_sequence(rnn_out, batch_first=True)

	# Reshape final output to (batch_size, hidden_size)
	I = lengths.expand(self.embed_size, 1, -1).permute(2, 1, 0) - 1
	out = torch.gather(padded[0], 1, I).squeeze(1)
	out = self.dropout(out)

	states, _ = self.rnn(features)
	states = self.dropout(states)
	out = states[:, -1, :]