'a mask prediction' in Sec. 3.2.2 of Paper

Question

'a mask prediction' in Sec. 3.2.2 of Paper

Closed this issue 2 months ago · 5 comments

Huster-Hq commented 3 months ago

Is the mask prediction single channel, i.e., H×W×1?

hkchengrex commented 3 months ago

Yes.

Answer 1 · 2024-07-12T09:07:11.000Z

I have a question about the detail of Object Memory：

The object memory are computed by N pooling masks $W$. However, these pooling masks do not have a constraint label, unlike the mask $M_l$ projected from the pixel features constrained by GT mask. I can't understand the information contained in these pooling masks and why one half can be foreground predictions and the other half is background predictions. I wonder if you have directly visualized these masks.

Answer 2 · 2024-07-12T13:47:50.000Z

Isn't $W$ generated by the memory feature $F$ through a MLP?

Answer 3 · 2024-07-12T13:50:27.000Z

What do you mean by "constraint label"? W is directly constructed from M_l in the screenshot that you provided. There are no additional transformations. Those masks are just the masks in Figure 4 (and their inverse).

Figure 4 shows the $M_l$ rather than pooling masks $W$.

Answer 4 · 2024-07-12T15:19:45.000Z

Oh, right. Sorry -- it slipped my mind. We have visualized them before at some point. IIRC those masks are rather diffuse and don't have very recognizable patterns. They are learned end-to-end.