Referring captioning demo not using grounding mask

Question

Referring captioning demo not using grounding mask

bhpfelix opened this issue a year ago · 1 comments

Hi, thanks for the great work! Quick question about demo/demo_refcap.py: the grounding mask is zeroed out at this line, which seems counterintuitive if we want to pass it to the cross-attention layers. Should the line be removed for proper behavior?

Answer 1 · 2023-04-28T18:31:43.000Z

Thanks so much for the question and dig the code too much! That line could be removed for proper behavior, I add it for debug purpose. Please go ahead to do this, I will update the code shortly.