Understanding predicting multiple masks

Question

Understanding predicting multiple masks

Opened this issue 3 months ago · 0 comments

I’m currently extending your model and trying to better understand how multiple masks are handled during training, particularly with respect to multiple [SEG] tokens. I noticed that when batch_size=1, this line appears to only calculate a single mask, even when there are multiple [SEG] tokens predicted.

As a result, it seems that seg_token_offset would have the shape 1xM (where M is the number of predicted [SEG] tokens), and the following loop:

for i in range(len(seg_token_offset) - 1):

would only execute once.

Could you clarify whether this behavior is intended, or if there is something I’m missing regarding how multiple masks should be outputted in this case?

Thanks in advance for your insights!

Best,
Josh