Global Local Transformer for Visual Commonsense

[Caution: This reposity is still under development mode and not cleanly documented yet. We only recommed you to use it as a reference.]

At this repository, we build our Global-Local-Transformer model on top of a selection of base scene grap generator models including KERN, Neural Motif, Stanford, etc to improve scene graph generation by leveraging Visual Commonsense

The corresponding paper was accepted at ECCV 2020 arXiv preprint arXiv:2006.09623 (2020). Alireza Zareian*, Zhecan Wang*, Haoxuan You*, Shih-Fu Chang, "Learning Visual Commonsense for Robust Scene Graph Generation", ECCV, 2020. (* co-first authors) [manuscript]

This repository focuses on the pretraining and independent finetuning of GLAT model. More details about the complete scene graph generation pipeline and base model configuration could refer to another repository(https://github.com/ZhecanJamesWang/GLAT_SGG)

ZhecanJamesWang/GLAT_Visual_Commonsense

Global Local Transformer for Visual Commonsense