[ACL2023] VSTAR is a multimodal dialogue dataset with scene and topic transition information
Primary LanguagePython