Look Twice as Much as You Say: Scene Graph Contrastive Learning for Self-Supervised Image Caption Generation Are scene graphs good enough to improve Image Captioning?
This is the PyTorch implementation of Look Twice as Much as You Say: Scene Graph Contrastive Learning for Self-Supervised Image Caption Generation? training code.
We will continue to add more details and code to this repository as we go along. The configuration of this code can be referred to in the butd-image-captioning .