Learning phrase grounding from captioned images through InfoNCE bound on mutual information
Primary LanguagePythonOtherNOASSERTION