Referring captioning demo not using grounding mask
bhpfelix opened this issue · 1 comments
bhpfelix commented
Hi, thanks for the great work! Quick question about demo/demo_refcap.py
: the grounding mask is zeroed out at this line, which seems counterintuitive if we want to pass it to the cross-attention layers. Should the line be removed for proper behavior?
MaureenZOU commented
Thanks so much for the question and dig the code too much! That line could be removed for proper behavior, I add it for debug purpose. Please go ahead to do this, I will update the code shortly.