uzh-rpg/RVT

Questions related to the plain LSTM Cell

Closed this issue · 5 comments

Hello, I'm quite interested in the section about LSTM, but I'm still trying to understand how the "plain LSTM cell" mentioned in the paper is actually applied. It seems like the backbone network utilizes "DWSConvLSTM2d," which stands for depth-wise convolutional LSTM ?

Hi @batman47steam

DWSConvLSTM2d can function as a plain LSTM when configured in a certain way. It's designed to be flexible and allows to toggle between standard LSTMs and depthwise-separable conv LSTMs. The configs show that it's set up to act like a regular LSTM, despite using 1x1 convolutions, which are mathematically equivalent to matrix multiplication

Hi, thank you for your response. So the default settings of DWSConvLSTM2d are for the plain LSTM, since the hidden state undergoes a 3x3 convolution. and only a 1x1 convolutional interaction occurs between the input and hidden state ?

It's just a 1x1 convolution. You can see this here:

  • default config specifies lstm.dws_conv=False here
  • config is passed to class init function here.
  • As a consequence, self.conv3x3_dws is set to nn.Identity() here

@batman47steam if that answers your question, feel free to close this issue

Thank you. That solves my question very well