batra-mlp-lab/visdial-rl

So many UNK in captions

okisy opened this issue · 3 comments

okisy commented

Thanks to your kindness, I managed to run your code.
By the way, here is one more question.
I ran

python evaluate.py -useGPU \
    -startFrom checkpoints/abot_rl_ep20.vd \
    -qstartFrom checkpoints/qbot_rl_ep20.vd \
    -evalMode dialog \
    -cocoDir /my/path/to/coco/images/ \
    -cocoInfo /my/path/to/coco.json \
    -beamSize 5

then implemented

cd dialog_output/
python -m http.server 8000

however, I found that the visualized captions were quite different from those on your pic
There were so many "UNK" in my result. Is it natural? Or not?
And can you tell me in what condition I could make similar results to yours?

0622-1_rl_ep_20

2018-06-28 23 24 26

Regarding why your dialog visualization does not match the figure in the README - the command had a missing line for giving as input the generated caption file instead of the GT one. 7f3e7e2 fixes this, the updated command should give a similar dialog visualization now.

Coming back to the UNKs in the ground truth captions - It seems like some of the UNKs at the start are for the word "a", which is odd because the same word is not an UNK elsewhere. This might be a preprocessing issue, will look into it.

okisy commented

Thanks for your quick response.
In addition to the question, I would like to ask you a minor setting with regard to this.
When you generated the figure in README, which "inference"(greedy or sample) did you choose?

beamSize=beamSize, inference='greedy')

beamSize=beamSize, inference='greedy')

@nirbhayjm I think the UNKs are produced when there is a capitalized alphabet in the caption or question and not specifically for the alphabet a.

image