VideoCaption

Video captioning using LSTM and CNN following the paper sequence to sequence - video to text. This is the Visual Learning project done by Rui Zhang, Yujia Huang and Yu Zhang. Neuraltalk2 from Karpathy is taken as reference.

##Model As states in the paper, the model can be divided into two parts: encoder and decoder. Encoder and Decoder share the same architecture but: frames are fed into the framework and outputs are ignored during encoding while both frames and words get fed into the framework and output taken as final output during decoding.

Rangozhang/VideoCaption

VideoCaption