chitwansaharia/HACAModel

Implementation of "Watch, Listen, and Describe: Globally and Locally Aligned Cross-Modal Attentions for Video Captioning" (https://arxiv.org/abs/1804.05448)

Python

HACAModel

Implementation of "Watch, Listen, and Describe: Globally and Locally Aligned Cross-Modal Attentions for Video Captioning" (https://arxiv.org/abs/1804.05448)

Requirements:

tensorboardX
pytorch

Use example has been provided in slurm.sh