/ClipCap

Using pretrained encoder and language models to generate captions from multimedia inputs.

Primary LanguagePython

Issues