[TMM'23]Multi-modal Structure-embedding Graph Transformer for Visual Commonsense Reasoning
Primary LanguagePythonMIT LicenseMIT