Code and data setup for the paper "Are Diffusion Models Vision-and-language Reasoners?"
Primary LanguagePython