Size prob

Question

Size prob

Opened this issue 3 months ago · 5 comments

When the size of hr isn‘t the multiple factor 2 to lr, how can i use your module。

sometimes hr size is odd, after downsampling, lossing some pixel:27x27 ->13x13->26x26.

Answer 1 · 2024-09-23T03:48:58.000Z

训练也遇到了同样的问题，五个尺度依次为184x264，92x132，46x66，23x33，12x17，可以看出，在第四的维度已变成奇数，导致第五个维度的lr乘以2在执行carafe上采样时，assert masks.size(-1) == features.size(-1) * scale_factor 报错。
原版FasterRCNN-ResNet101也存在这样的问题。

请教一下，该如何修改？

Answer 2 · 2024-09-23T04:04:16.000Z

The best solution is to pad (or resize) the input to ensure it can be evenly divided by 64. You can modify the configuration in MMDetection by:

test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='MultiScaleFlipAug',
        img_scale=(1333, 800),
        flip=False,
        transforms=[
            dict(type='Resize', keep_ratio=True),
            dict(type='RandomFlip'),
            dict(type='Normalize', **img_norm_cfg),
            dict(type='Pad', size_divisor=64), ## This line
            dict(type='ImageToTensor', keys=['img']),
            dict(type='Collect', keys=['img']),
        ])
]

Answer 3 · 2024-09-23T07:39:05.000Z

The best solution is to pad (or resize) the input to ensure it can be evenly divided by 64. You can modify the configuration in MMDetection by:

test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='MultiScaleFlipAug',
        img_scale=(1333, 800),
        flip=False,
        transforms=[
            dict(type='Resize', keep_ratio=True),
            dict(type='RandomFlip'),
            dict(type='Normalize', **img_norm_cfg),
            dict(type='Pad', size_divisor=64), ## This line
            dict(type='ImageToTensor', keys=['img']),
            dict(type='Collect', keys=['img']),
        ])
]

thank you，bro.

Answer 4 · 2024-09-23T07:45:38.000Z

感谢回复

Answer 5 · 2024-09-26T00:07:54.000Z

感谢回复

The best solution is to pad (or resize) the input to ensure it can be evenly divided by 64. You can modify the configuration in MMDetection by:

test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='MultiScaleFlipAug',
        img_scale=(1333, 800),
        flip=False,
        transforms=[
            dict(type='Resize', keep_ratio=True),
            dict(type='RandomFlip'),
            dict(type='Normalize', **img_norm_cfg),
            dict(type='Pad', size_divisor=64), ## This line
            dict(type='ImageToTensor', keys=['img']),
            dict(type='Collect', keys=['img']),
        ])
]

if debug using FreqFusion.py independently, before inputing img to network,we can use following code to pad img:

class Normalization_Pad():
    def __init__(self, size_divisor):
        self.factor = 2 ** (size_divisor)
    def pad(self, image):
        h, w = image.shape[2], image.shape[3]
        H, W = ((h + self.factor) // self.factor) * \
            self.factor, ((w + self.factor) // self.factor) * self.factor

        padh = H - h if h % self.factor != 0 else 0
        padw = W - w if w % self.factor != 0 else 0
        image = F.pad(image, (0, padw, 0, padh), 'reflect')
        return image

In above code, size_divisor is equal to the times of downsampling.