/ConvLSTM

Spatio-temporal video autoencoder with convolutional LSTMs

Primary LanguageLua

ConvLSTM

Source code associated with Spatio-temporal video autoencoder with differentiable memory, published in ICLR2016 Workshop track.

This is a demo version to be trained on a modified version of moving MNIST dataset, available here. Some videos obtained on real test sequences are also available here (not up-to-date though).

The repository contains also a demo, main-demo-ConvLSTM.lua, of training a simple model, model-demo-ConvLSTM.lua, using the ConvLSTM module to predict the next frame in a sequence. The difference between this model and the one in the paper is that the former does not explicitly estimate the optical flow to generate the next frame.

The ConvLSTM module can be used as is. Optionally, the untied version implemented in UntiedConvLSTM class, can be used. The latter uses a separate model for the first step in the sequence, which has no memory. This can be helpful in training on shorter sequences, to reduce the impact of the first (memoryless) step on the training.

Dependencies

  • rnn: our code extends rnn by providing a spatio-temporal convolutional version of LSTM cells.
  • extracunn: contains cuda code for SpatialConvolutionalNoBias layer and Huber gradient computation.
  • stn.

To cite our paper/code:

@inproceedings{PatrauceanHC16,
  author    = {Viorica P{\u a}tr{\u a}ucean and
               Ankur Handa and
               Roberto Cipolla},
  title     = {Spatio-temporal video autoencoder with differentiable memory},
  booktitle = {International Conference on Learning Representations (ICLR) Workshop},
  year      = {2016}
}