GuessWhich

GuessWhich is a cooperative image-guessing game between two agents: Q-BOT and A-BOT, like that of GuessWhat?! game that is an image object-guessing game between two players.

GuessWhich is a two player game played by Qbot and Abot. The goal of GuessWhich is to figure out a correct answer out of 9,628 test images by asking a sequence of questions. Abot can see the randomly assigned target image, which is unknown to Qbot. Qbot only observes a caption of the image generated from Neuraltalk2 (Vinyals & Le, 2015). To achieve the goal, Qbot asks a series of questions, to which Abot responds with a sentence. [This part is from the paper of ICLR 2019, Large-scale Answer in Questioner's Mind for Visual Dialog Question Generation, Sang-Woo Lee et al.]

The two agents communicate in natural language dialogue. In the beginning, they can see a broader set of images, in which ABot randomly selects an image as the secret that is not known to Q-BOT. Q-Bot asks a sequence of free-form natural language questions and ABot responds with free-form answers. In the end, QBot tries to identify the secret image from the fixed pool of images. If the right image is found, the dialogue is considered a success, otherwise, failure.

1. under review...

2. ...

Acknowledgements

This PyTorch implementation is based on the PyTorch code of Learning Cooperative Visual Dialog Agents using Deep Reinforcement Learning [Das & Kottur et al., ICCV 2017]. Github:https://github.com/batra-mlp-lab/visdial-rl

pytorch env

PyTorch version: 1.2.0
CUDA used to build PyTorch: 10.0.130

OS: Ubuntu 16.04.4 LTS
GCC version: (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609
CMake version: Could not collect

Python version: 3.6
Is CUDA available: Yes
CUDA runtime version: 10.0.130
GPU models and configuration:
GPU 0: TITAN V
GPU 1: TITAN V

Nvidia driver version: 410.79
cuDNN version: Could not collect

Versions of relevant libraries:
[pip3] msgpack-numpy==0.4.3.2
[pip3] numpy==1.16.2
[pip3] numpydoc==0.8.0
[pip3] torch==1.2.0
[pip3] torchfile==0.1.0
[pip3] torchtext==0.7.0
[conda] Could not collect

Challenges

GuessWhich is a challenging visual-language problem. It involves processing large amounts of images, and human's mental imagery that is spawned by a natural language dialogue consists of multi-round Question-Answer-pairs.

Performance

Current Vision-and-Language-and-Reasoning tasks, focuses on Visual Dialogue

LaVi Tasks	conference	comment
GuessWhich	AAAI 2017	🐫
Multimodal Dialogs(MMD)	AAAI 2018	-
CoDraw	ACL 2019	-
GuessWhat?!	CVPR 2017	😄
Multi-agent GuessWhich	AAMAS 2019	-
Image-Chat	ACL 2020
EmbodiedQA	CVPR 2018
VideoNavQA	BMVC 2019
GuessNumber	SLT 2018
VisDial	CVPR 2017	🐫
Image-Grounded Conversations(IGC)	CVPR 2017
VDQG	ICCV 2017
RDG-Image guessing game	LREC 2014
Deal or No Deal	CoRR 2017
Video-Grounded Dialogue Systems (VGDS)	ACL 2019
Vision-Language Navigation (VLN)	CVPR 2018
Image Captioning
Image Retrieval
Visually-grounded Referring Expressions
Multi-modal Verification	ACL 2019
Viual Dialog based Referring Expression
VQA

xubuvd/GuessWhich