csrhddlam/axial-deeplab

Question about table 9 in paper

CoinCheung opened this issue · 3 comments

Hi,

Thanks for the work, I noticed from the table 9 in the paper that the performance is relatively stable no matter if the output stride is 16 or 32 and no matter if the decoder is axial decoder. Have you noticed this in practice, and does this mean that we can simply use output stride of 32 without axial decoder which will make the model much light-weighted ?

Output stride 16 vs. 32: Yes, we noticed it in practice. However, we also noticed that this conclusion does not generalize to COCO. If I remember correctly, output stride 16 is better than 32 on COCO.

Axial-Decoder vs. Conv-Decoder: They do perform similarly. Separable conv seems good at decoding (and is lightweight).

Thanks for telling me this. By the way, the code only supports input size of 224 x 224, and if I want input to have other resolutions, I should modify the associated code, right ?

Right, you might need to pass the resolution into the model as kernel sizes.