/Sequence-to-sequence-Video-Captioning-System

An implementation of a sequence-to-sequence video captioning system inspired by the paper "Sequence to Sequence – Video to Text" by Subhashini Et. Al. An end-to-end sequence-to-sequence model is used to generate captions for videos. For this we exploit recurrent neural networks, specifically LSTMs, which have demonstrated state-of-the-art performance in image caption generation. The LSTM model is trained on video-sentence pairs and learns to associate a sequence of video frames to a sequence of words in order to generate a description of the event in the video clip.

Watchers