Uses a 3D convolution network over 20 timesteps explained in "Learning Spatiotemporal Features with 3D Convolutional Networks". Initially, predictions are inaccurate, but becomes better after 9 minutes. Better hyperparameter tuning or network choice is needed.
Video: YouTube
- Use of a CNN-LSTM network to process semantic information in an image, and sequential information of subsequent frames (In process)
- Implementing an optical flow model