dahyun-kang/ifsl

image size

Ehteshamciitwah opened this issue ยท 7 comments

Hello, Thank you for your work. Is the model suitable for all image sizes? I check with image sizes (256x256) and (384x384). But it shows errors or padding and dimensions. Can you please elaborate the relation of image size with model structure? Thank you

Hi

In the current implementation, the kernel sizes, padding, and stride are fit to the 400x400 image resolution which returns 13x13, 25x25, 50x50 feature map sizes from ResNet intermediate layers.
But you could also make adjustments by yourself on the sizes at

def make_building_attentive_block(in_channel, out_channels, kernel_sizes, spt_strides, pool_kv=False):

self.encoder_layer4 = make_building_attentive_block(inch[0], [32, 128], [5, 3], [4, 2])
self.encoder_layer3 = make_building_attentive_block(inch[1], [32, 128], [5, 5], [4, 4], pool_kv=True)
self.encoder_layer2 = make_building_attentive_block(inch[2], [32, 128], [5, 5], [4, 4], pool_kv=True)

As you want to reduce the image size, I'd recommend you reduce the kernel size and strides and find the right sizes.
Have a good day! ๐Ÿ˜ƒ

Best,
Dahyun

Thank you for your quick response. I will try to change it for image size (W,H)/(768,256). Do you have a pre-trained model for custom inputs or we need to train from scratch if I want to use other image sizes than 400? Thanks.

Unfortunately, we do not have model checkpoints trained on other image sizes readily available.

hmmm. Thanks for your responses. I have a last question. Is it possible to use one model for different image sizes or does one need to retrain every time for different image size? Thanks

The model gradually resizes the predefined input sizes in a fixed strided fashion, thus a model trained image size A cannot be transfered to perform with image size B as is.
There are two options I would suggest: 1) adjusting the model kernel&strides and retraining it from scratch with a certain image size or 2) resize the input correlation to fit the pretrained checkpoints.

Hello,as per your suggestion, I resize the correlation tensor to default before the encoder and use pre-trained weights. But after 150 epochs the miou is just 40.

Yes, a lower resolution feature would obviously have less sophisticated details and hence lower segmentation performance ๐Ÿ™‚