Question about table 9 in paper

Question

Question about table 9 in paper

CoinCheung opened this issue 4 years ago · 3 comments

Hi,

Thanks for the work, I noticed from the table 9 in the paper that the performance is relatively stable no matter if the output stride is 16 or 32 and no matter if the decoder is axial decoder. Have you noticed this in practice, and does this mean that we can simply use output stride of 32 without axial decoder which will make the model much light-weighted ?

Answer 1 · 2021-01-13T23:50:33.000Z

Output stride 16 vs. 32: Yes, we noticed it in practice. However, we also noticed that this conclusion does not generalize to COCO. If I remember correctly, output stride 16 is better than 32 on COCO.

Axial-Decoder vs. Conv-Decoder: They do perform similarly. Separable conv seems good at decoding (and is lightweight).

Answer 2 · 2021-01-14T06:05:09.000Z

Thanks for telling me this. By the way, the code only supports input size of 224 x 224, and if I want input to have other resolutions, I should modify the associated code, right ?

Answer 3 · 2021-02-02T16:25:34.000Z

Right, you might need to pass the resolution into the model as kernel sizes.