WingsBrokenAngel/delving-deeper-into-the-decoder-for-video-captioning

Inference on single video

nikky4D opened this issue · 3 comments

Hi,

do you have a demo.py/ipynb that I can use to run inference on a single video to see the captions generated? If not, can you describe how I can go about making this setup?

Thanks

  1. Encoder part: use ResNeXt, ECO and Semantic Detection Network to extract features from a video clip.
  2. Decoder part: use those features as inputs to the captioning model.

@WingsBrokenAngel
Can you please provide code/repo links on how to go about feature extraction for encoder part?

@WingsBrokenAngel
Can you please provide code/repo links on how to go about feature extraction for encoder part?

ResNeXt could be found in tensornets and ECO could be found in ECO-efficient-video-understanding.