/guesswhat

GuessWhat?! is a challenging task-oriented visual dialogue problem.<br>Tensorflow code for the papers, <Visual Dialogue State Tracking for Question Generation> and <Guessing State Tracking for Visual Dialogue>, accepted to appear in AAAI-2020, ECCV 2020, respectively.

Primary LanguagePython

Visual Dialogue -- GuessWhat?!

GuessWhat?! is an image object-guessing game between two players. Recently it has attracted considerable research interest in computer vision and natural language processing community.

come back

I'm back again, and I'll continue researching the GuessWhat visual dialogue task, with the help of LLM.

Special Lecture

08-07-2020, keynote "Visual question answering and dialogue" in Chinese by Prfo. Xiaojie Wang, https://ttv.cn/archives/10280

Latest Progress on the GuessWhat?! Game

Sampling accuracy on training set and validation set during Reinforencement Learning

Comparision with all the GuessWhat?! models that is known to date

Guesser QGen Max Q's NewObject_S G BS NewGame_S G BS
guesser[20] qgen[20] 5 41.6 43.5 47.1 39.2 40.8 44.6
guesser(MN)[27] TPG[27] 8 - 48.77 - - - -
guesser[19] qgen[19] 8 - 44.6 - - - -
GST(ours) qgen[19] 8 41.73 44.89 - 39.97 41.36 -
guesser[19] VDST[13] 5 45.02 49.49 - 42.92 45.94 -
guesser[19] VDST[13] 8 46.70 48.01 - 44.24 45.03 -
GST(ours) VDST[13] 5 49.55 53.35 53.17 46.95 50.58 50.71
GST(ours) VDST[13] 8 52.71 54.10 54.32 50.19 50.97 50.99
GDSE-SL[17] GDSE-SL[17] 5 - - - - 47.8 -
GDSE-SL[17] GDSE-SL[17] 8 - - - - 49.7 -
GDSE-CL[17] GDSE-CL[17] 5 - - - - 53.7 -
GDSE-CL[17] GDSE-CL[17] 8 - - - - 58.4 -
guesser[10] randQ[10] 5 - - - - 42.48 -
guesser[10] countQ[10] 5 - - - - 61.64 -
guesser(MN)[27] TPG[27] 5 62.6 - - - - -
guesser(MN)[27] TPG[27] 8 - - - - 74.3 -
guesser(MN)[27] ISM[1] - 74.4 - - 72.1 - -
guesser(MN)[27] TPG[27] 8 - 74.3 - - - -
guesser(MN)[27] ISD[2] 5 68.3 69.2 - 66.3 67.1 -
guesser[19] VQG[26] 5 63.2 63.6 63.9 59.8 60.7 60.8
guesser[19] ISM[1] - - 64.2 - - 62.1 -
guesser[19] ISD[2] 5 61.4 62.1 63.6 59.0 59.8 60.6
guesser[19] RIG(rewards)[18] 8 65.20 63.00 63.08 64.06 59.0 60.21
guesser[19] RIG(loss)[18] 8 67.19 63.19 62.57 65.79 61.18 59.79
guesser[19] qgen[19] 5 58.5 60.3 60.2 56.5 58.4 58.4
guesser[19] qgen[19] 8 62.8 58.2 53.9 60.8 56.3 52.0
guesser(MN)[27] qgen[19] 5 59.41 60.78 60.28 56.49 58.84 58.10
guesser(MN)[27] qgen[19] 8 62.05 62.73 - 59.04 59.50 -
GST(ours) qgen[19] 5 64.78 67.06 67.01 61.77 64.13 64.26
guesser[19] VDST[13] 5 66.22 67.07 67.81 63.85 64.36 64.44
guesser[19] VDST[13] 8 69.51 70.55 71.03 66.76 67.73 67.52
GST(ours) VDST[13] (ours) 5 77.38 77.30 77.23 75.11 75.20 75.13
GST(ours) VDST[13] (ours) 8 83.22 83.32 83.46 81.50 81.55 81.62
Human[19] - - - 84.4 - - 84.4 -

As shown in the uploaded figure "guesser_201911_10.png", our latest progress on GuessWhat?! game, it achieves near-perfect accuracy of 83.3% and outperforms all the previous methods. Notes that the human-level performance is 84.4%.

This research was started in Mar. 2019 and ended in Nov. 2019.

Current Vision-and-Language-and-Reasoning tasks, focuses on Visual Dialogue

LaVi Tasks conference comment
GuessWhich AAAI 2017 🐫
Multimodal Dialogs(MMD) AAAI 2018 -
CoDraw ACL 2019 -
GuessWhat?! CVPR 2017 😄
Multi-agent GuessWhich AAMAS 2019 -
Image-Chat ACL 2020
EmbodiedQA CVPR 2018
VideoNavQA BMVC 2019
GuessNumber SLT 2018
VisDial CVPR 2017 🐫
Image-Grounded Conversations(IGC) CVPR 2017
VDQG ICCV 2017
RDG-Image guessing game LREC 2014
Deal or No Deal CoRR 2017
Video-Grounded Dialogue Systems (VGDS) ACL 2019
Vision-Language Navigation (VLN) CVPR 2018
Image Captioning
Image Retrieval
Visually-grounded Referring Expressions
Multi-modal Verification ACL 2019
Viual Dialog based Referring Expression
VQA

Downloading pretrained model

We test the pretrained model on train, val and test set similar to previous work, each many times, the accuracy of the pretrained model as follows:

732,154 and 372 are the number of batch of size 64 in train, val and test set respectively.

1:

New Objects <<<
100%|██████████| 732/732 [05:44<00:00, 2.86it/s]
Accuracy (train - greedy): 0.8382912339188785
ErroRate (train - greedy): 0.1617087660811215
100%|██████████| 732/732 [05:12<00:00, 3.28it/s]
Accuracy (train - sampling): 0.8304910886011027
ErroRate (train - sampling): 0.16950891139889734
valid set <<<
100%|██████████| 154/154 [01:08<00:00, 3.15it/s]
Accuracy (valid - greedy): 0.8300487606663958
ErroRate (valid - greedy): 0.16995123933360423
100%|██████████| 154/154 [01:14<00:00, 3.24it/s]
Accuracy (valid - sampling): 0.8212108898821617
ErroRate (valid - sampling): 0.17878911011783827
New Games <<<
100%|██████████| 372/372 [02:54<00:00, 3.10it/s]
Accuracy (test - greedy): 0.815471936094177
ErroRate (test - greedy): 0.184528063905823
100%|██████████| 372/372 [02:50<00:00, 3.31it/s]
Accuracy (test - sampling): 0.8124027748581039
ErroRate (test - sampling): 0.18759722514189614
------------------------------------------------<<<


2:

New Objects <<<
100%|██████████| 732/732 [05:41<00:00, 4.02it/s]
Accuracy (train - greedy): 0.8367312048553234
ErroRate (train - greedy): 0.16326879514467663
valid set <<<
100%|██████████| 154/154 [01:06<00:00, 3.75it/s]
Accuracy (valid - greedy): 0.8267980495733441
ErroRate (valid - greedy): 0.17320195042665587
New Games <<<
100%|██████████| 372/372 [02:37<00:00, 3.71it/s]
Accuracy (test - greedy): 0.815471936094177
ErroRate (test - greedy): 0.184528063905823
------------------------------------------------<<<


3:

New Objects <<<
100%|██████████| 732/732 [05:09<00:00, 3.46it/s]
Accuracy (train - greedy): 0.83559858101466
ErroRate (train - greedy): 0.16440141898534
valid set <<<
100%|██████████| 154/154 [00:58<00:00, 3.33it/s]
Accuracy (valid - greedy): 0.8278138967899228
ErroRate (valid - greedy): 0.1721861032100772
New Games <<<
100%|██████████| 372/372 [02:45<00:00, 3.32it/s]
Accuracy (test - greedy): 0.815471936094177
ErroRate (test - greedy): 0.184528063905823
------------------------------------------------<<<


4:

New Objects <<<
100%|██████████| 732/732 [05:28<00:00, 3.99it/s]
Accuracy (train - greedy): 0.8344232166517075
ErroRate (train - greedy): 0.1655767833482925
valid set <<<
100%|██████████| 154/154 [01:03<00:00, 3.57it/s]
Accuracy (valid - greedy): 0.8300487606663958
ErroRate (valid - greedy): 0.16995123933360423
New Games <<<
100%|██████████| 372/372 [02:44<00:00, 3.73it/s]
Accuracy (test - greedy): 0.815471936094177
ErroRate (test - greedy): 0.184528063905823
------------------------------------------------<<<

Training