Seemingly missing reshaping operation in point prompt encoding in PromptEncoder
Opened this issue · 0 comments
lppllppl920 commented
If I read it correctly, the shape of points input is [B, N, 2]
, where B is the batch size and N is the number of points per image. The padding ensures that the point prompt also contains the 2d coordinates of two points to make it compatible with the box prompt. Without reshaping operation before the torch.cat
operation, wouldn't the shape become [B, N + 1, 2]
after the padding. This doesn't feel right. Since this PromptEncoder is used in the SAM2 as well, it seems to impact both models.
Please correct me if I misunderstand any part of this.
Thank you!