Linwei-Chen/FreqFusion

Size prob

Opened this issue · 5 comments

When the size of hr isn‘t the multiple factor 2 to lr, how can i use your module。

sometimes hr size is odd, after downsampling, lossing some pixel:27x27 ->13x13->26x26.

训练也遇到了同样的问题,五个尺度依次为184x264,92x132,46x66,23x33,12x17,可以看出,在第四的维度已变成奇数,导致第五个维度的lr乘以2在执行carafe上采样时,assert masks.size(-1) == features.size(-1) * scale_factor 报错。
原版FasterRCNN-ResNet101也存在这样的问题。

请教一下,该如何修改?

The best solution is to pad (or resize) the input to ensure it can be evenly divided by 64. You can modify the configuration in MMDetection by:

test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='MultiScaleFlipAug',
        img_scale=(1333, 800),
        flip=False,
        transforms=[
            dict(type='Resize', keep_ratio=True),
            dict(type='RandomFlip'),
            dict(type='Normalize', **img_norm_cfg),
            dict(type='Pad', size_divisor=64), ## This line
            dict(type='ImageToTensor', keys=['img']),
            dict(type='Collect', keys=['img']),
        ])
]

The best solution is to pad (or resize) the input to ensure it can be evenly divided by 64. You can modify the configuration in MMDetection by:

test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='MultiScaleFlipAug',
        img_scale=(1333, 800),
        flip=False,
        transforms=[
            dict(type='Resize', keep_ratio=True),
            dict(type='RandomFlip'),
            dict(type='Normalize', **img_norm_cfg),
            dict(type='Pad', size_divisor=64), ## This line
            dict(type='ImageToTensor', keys=['img']),
            dict(type='Collect', keys=['img']),
        ])
]

thank you,bro.

感谢回复

感谢回复

The best solution is to pad (or resize) the input to ensure it can be evenly divided by 64. You can modify the configuration in MMDetection by:

test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='MultiScaleFlipAug',
        img_scale=(1333, 800),
        flip=False,
        transforms=[
            dict(type='Resize', keep_ratio=True),
            dict(type='RandomFlip'),
            dict(type='Normalize', **img_norm_cfg),
            dict(type='Pad', size_divisor=64), ## This line
            dict(type='ImageToTensor', keys=['img']),
            dict(type='Collect', keys=['img']),
        ])
]

if debug using FreqFusion.py independently, before inputing img to network,we can use following code to pad img:

class Normalization_Pad():
    def __init__(self, size_divisor):
        self.factor = 2 ** (size_divisor)
    def pad(self, image):
        h, w = image.shape[2], image.shape[3]
        H, W = ((h + self.factor) // self.factor) * \
            self.factor, ((w + self.factor) // self.factor) * self.factor

        padh = H - h if h % self.factor != 0 else 0
        padw = W - w if w % self.factor != 0 else 0
        image = F.pad(image, (0, padw, 0, padh), 'reflect')
        return image

In above code, size_divisor is equal to the times of downsampling.