ruizheliUOA/HR-VAE

Wrong use bidirectional LSTM

kklemon opened this issue · 3 comments

I might be mistaken, but the way bidirectional LSTMs are used in the encoder module does not really make sense.

In PyTorch, bidirectionality is handled by the underlying implementation of the LSTM module (e.g. by cuDNN) and is applied on the whole input sequence. As Encoder implements HR-LSTM by using a normal nn.LSTM but in a custom loop with one step at a time, the bidirectional LSTM would internally run bidirectional only on the given sequence of length one which practically would reduce it to two LSTMs being applied independently but both in forward direction.

I guess the simplest way to implement proper bidirectional handling would be to use two separate LSTMs where one processes the input in the normal and the other in the reverse direction.

I might be mistaken, but the way bidirectional LSTMs are used in the encoder module does not really make sense.

In PyTorch, bidirectionality is handled by the underlying implementation of the LSTM module (e.g. by cuDNN) and is applied on the whole input sequence. As Encoder implements HR-LSTM by using a normal nn.LSTM but in a custom loop with one step at a time, the bidirectional LSTM would internally run bidirectional only on the given sequence of length one which practically would reduce it to two LSTMs being applied independently but both in forward direction.

I guess the simplest way to implement proper bidirectional handling would be to use two separate LSTMs where one processes the input in the normal and the other in the reverse direction.

Hi there, I am sorry about replying you so late. Actually our model does not use the bidirectional function in the encoder and decoder if you check the code in HR-VAE.py. However, I think you might be right according to how to implement the bidirectional function in RNN. I will try your suggestions and see the difference compared to the PyTorch function.

It's actually not mentioned in the paper, but I was just a bit confused because the encoder and decoder class offer the option for it so I thought it was meant to work at some point.

It's actually not mentioned in the paper, but I was just a bit confused because the encoder and decoder class offer the option for it so I thought it was meant to work at some point.

The option was just a try when I wrote the encoder and decoder class for the first time (forgot to delete it finally). But it might work better if the bidirectional function is set True (I did not test it although).