RuntimeError: size mismatch, m1, m2

Question

RuntimeError: size mismatch, m1, m2

Opened this issue 4 years ago · 5 comments

Hi, I have encountered an error when training.
I am trying to train the model using DIV2K dataset, DIV2K_train_HR , DIV2K_train_LR_bicubic/X4.
After python create_dataset.py, I successfully generated the data. According to the LRHR_dataset.py, I put data in the right place. When I start to train, it downloaded the pretrained model, and I got an error like this:

LogHandlers setup!
21-06-15 20:41:57.700 : ===================== Selected training parameters =====================
21-06-15 20:41:57.701 : Namespace(D_init_iters=0, D_update_ratio=1, alpha=1.2, amsgrad=False, beta1_D=0.9, beta1_G=0.9, beta2_D=0.999, beta2_G=0.999, cuda=True, eps_D=1e-08, eps_G=1e-08, feature_criterion='l1', feature_weight=1.0, gan_type='ragan', gan_weight=1.0, imdbTestPath='./datasets/', imdbTrainPath='./datasets/', in_nc=3, is_mixup=True, is_train=True, lr_D=0.0001, lr_G=0.0001, lr_gamma=0.5, lr_milestones=[5000, 10000, 20000, 30000], lr_restart=None, lr_restart_weights=None, nf=64, niter=51000, numWorkers=4, patch_size=40, pixel_criterion='l1', pixel_weight=10.0, pretrain=True, pretrainedModelPath='pretrained_nets/SRResDNet/G_perceptual.pth', resdnet_depth=5, resume=True, resume_start_epoch=0, rgb_range=255, saveBest=True, saveImgsPath='results', saveLogsPath='logs', saveTrainedModelsPath='trained_nets', save_checkpoint_freq=20, save_path_best_lpips='/best_lpips/', save_path_best_psnr='/best_psnr/', save_path_netD='/netD/', save_path_netG='/netG/', save_path_training_states='/training_states/', seed=123, testBatchSize=1, test_stdn=[0.0], trainBatchSize=16, train_stdn=[0.0], tv_criterion='l1', tv_weight=1.0, upscale_factor=4, use_bn=False, use_chop=False, use_filters=True, warmup_iter=-1, weightdecay_D=0, weightdecay_G=0).
21-06-15 20:41:57.701 : ===================== Loading dataset =====================
21-06-15 20:41:57.706 : training dataset:  2400
21-06-15 20:41:57.706 : training loaders:   150
21-06-15 20:41:57.707 : testing dataset:   100
21-06-15 20:41:57.707 : testing loaders:   100
21-06-15 20:41:57.707 : ===================== Building model =====================
21-06-15 20:41:57.803 : Initialized model with pretrained net from pretrained_nets/SRResDNet/G_perceptual.pth.
Setting up Perceptual loss...
Loading model from: /home/xuwh/RJPcode/SRResCGAN-master/training_codes/modules/weights/v0.1/alex.pth
...[net-lin [alex]] initialized
...Done
21-06-15 20:42:01.452 : Network G structure: SRResDNet, with parameters: 380,356
21-06-15 20:42:01.452 : SRResDNet(
  (model): ResDNet(
    (conv1): Conv2d(3, 64, kernel_size=(5, 5), stride=(1, 1))
    (layer1): Sequential(
      (0): BasicBlock(
        (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1))
        (relu1): PReLU(num_parameters=64)
        (relu2): PReLU(num_parameters=64)
        (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1))
      )
      (1): BasicBlock(
        (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1))
        (relu1): PReLU(num_parameters=64)
        (relu2): PReLU(num_parameters=64)
        (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1))
      )
      (2): BasicBlock(
        (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1))
        (relu1): PReLU(num_parameters=64)
        (relu2): PReLU(num_parameters=64)
        (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1))
      )
      (3): BasicBlock(
        (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1))
        (relu1): PReLU(num_parameters=64)
        (relu2): PReLU(num_parameters=64)
        (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1))
      )
      (4): BasicBlock(
        (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1))
        (relu1): PReLU(num_parameters=64)
        (relu2): PReLU(num_parameters=64)
        (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1))
      )
    )
    (conv_out): ConvTranspose2d(64, 3, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
    (l2proj): L2Proj()
  )
  (noise_estimator): Wmad_estimator()
  (bbproj): Hardtanh(min_val=0.0, max_val=255.0)
)
21-06-15 20:42:01.453 : Network D structure: Discriminator_VGG_128, with parameters: 14,499,401
21-06-15 20:42:01.453 : Discriminator_VGG_128(
  (conv0_0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (conv0_1): Conv2d(64, 64, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
  (bn0_1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (conv1_0): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
  (bn1_0): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (conv1_1): Conv2d(128, 128, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
  (bn1_1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (conv2_0): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
  (bn2_0): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (conv2_1): Conv2d(256, 256, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
  (bn2_1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (conv3_0): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
  (bn3_0): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (conv3_1): Conv2d(512, 512, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
  (bn3_1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (conv4_0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
  (bn4_0): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (conv4_1): Conv2d(512, 512, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
  (bn4_1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (linear1): Linear(in_features=8192, out_features=100, bias=True)
  (linear2): Linear(in_features=100, out_features=1, bias=True)
  (lrelu): LeakyReLU(negative_slope=0.2, inplace=True)
)
21-06-15 20:42:01.453 : Network F structure: VGGFeatureExtractor, with parameters: 20,024,384
21-06-15 20:42:01.453 : VGGFeatureExtractor(
  (features): Sequential(
    (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU(inplace=True)
    (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (3): ReLU(inplace=True)
    (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (6): ReLU(inplace=True)
    (7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (8): ReLU(inplace=True)
    (9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (11): ReLU(inplace=True)
    (12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (13): ReLU(inplace=True)
    (14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (15): ReLU(inplace=True)
    (16): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (17): ReLU(inplace=True)
    (18): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (19): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (20): ReLU(inplace=True)
    (21): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (22): ReLU(inplace=True)
    (23): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (24): ReLU(inplace=True)
    (25): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (26): ReLU(inplace=True)
    (27): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (28): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (29): ReLU(inplace=True)
    (30): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (31): ReLU(inplace=True)
    (32): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (33): ReLU(inplace=True)
    (34): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  )
)
21-06-15 20:42:01.454 : ===================== start training =====================
21-06-15 20:42:01.454 : ===================== resume training =====================
21-06-15 20:42:01.454 : ===> No saved training states to resume.
21-06-15 20:42:01.454 : ===> start training from epoch: 0, iter: 0.
21-06-15 20:42:01.454 : Total # of epochs for training: 340.
21-06-15 20:42:01.454 : ===> train:: Epoch[1]
21-06-15 20:42:03.040 : ===> train:: Epoch[1] 	 Iter-step[1]
Traceback (most recent call last):
  File "main_sr_color.py", line 1057, in <module>
    main()
  File "main_sr_color.py", line 964, in main
    current_step)
  File "main_sr_color.py", line 418, in train
    pred_g_fake = netD(filter_high(fake_H))
  File "/home/xuwh/anaconda3/envs/srrescgan/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/xuwh/RJPcode/SRResCGAN-master/training_codes/models/discriminator_vgg_arch.py", line 57, in forward
    fea = self.lrelu(self.linear1(fea))
  File "/home/xuwh/anaconda3/envs/srrescgan/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/xuwh/anaconda3/envs/srrescgan/lib/python3.6/site-packages/torch/nn/modules/linear.py", line 87, in forward
    return F.linear(input, self.weight, self.bias)
  File "/home/xuwh/anaconda3/envs/srrescgan/lib/python3.6/site-packages/torch/nn/functional.py", line 1370, in linear
    ret = torch.addmm(bias, input, weight.t())
RuntimeError: size mismatch, m1: [16 x 12800], m2: [8192 x 100] at /opt/conda/conda-bld/pytorch_1579027003190/work/aten/src/THC/generic/THCTensorMathBlas.cu:290

I understand that m1 [a * b], m2: [c * d] where b=c. But how does this error occur even if I leave the source code same?
Is that the problems of hpyer-parameter?

I am new in deep learning. So really confused why does it happen and wonder how can I fix it?
Thanks in advance.

Answer 1 · 2021-06-15T20:25:02.000Z

Hi, you set the patch_size=40, but the discriminator network netD takes the fake_HR of size 128x128x3, while your SR output has fake_HR size of 160x160x3.
Can you set the patch_size=32 (default setting)? I hope this issue will not occur.

Thanks.

Answer 2 · 2021-06-16T04:14:07.000Z

Oh! Thanks!
It works when I set patch_size=32.
But it's really weird that in the main_sr_color.py

parser.add_argument('--patch_size', type = int, default = 32, help='patch size for training. [x2-->60,x3-->50,x4-->40]')

Line 65 or so which suggests that you should set patch_size=40 when upscale=4
Whatever, it starts training. Really appreciate your help, sir.

Another question about the test image database.
I notice in LRHR_dataset.py line 39-40:

self.dataroot_hr = dataroot+'DF2K/valid/clean/'
self.dataroot_lr = dataroot+'DF2K/valid/corrupted/'

Does it mean that I should put the HR data generated from create_dataset.py to DF2K/valid/clean and the produced LR noisy data to DF2K/valid/corrupted
or
I should put origin non-noisy LR image of DIV2K(the dataset I use) to DF2K/valid/clean and LR noisy data to DF2K/valid/corrupted.
Thanks a lot.

Answer 3 · 2021-06-16T07:46:11.000Z

Thank you for indicating the issues, I updated the comments in the main_sr.py.

For the valid dataset, you can use HR data generated from the script create_dataset.py to DF2K/valid/clean and the produced LR noisy data to DF2K/valid/corrupted.

For the Ntire Challenge task, they provide the validation set, where the clean images are the HR images, and corrupted images are the LR images. You can use this validation set to reproduce the challenge results.

Thanks.

Answer 4 · 2021-06-17T05:30:37.000Z

Thanks for responding!!!

Read the modified main_sr.py, I saw [x2-->64,x3-->42,x4-->32].
Another question just pump into my head. If I want upscale=5, and what patch_size should be? For instance, If I get a pair of images with low resolution of 100*100 and high resolution of 500 * 500.

First try, I can upscale 100 * 100 to 125*125, then train the model with upscale=4.

Second option, I just set upscale=5, but does the model support upscale=5?
If the model does support, what is appropriate patch_size should be set?
Or how can I calculate the corresponding patch_size according to the upscale?
Is that just a simple math problem cause I see the upscale of x4 is half of x2 or should I read from model structure?

Sorry for bothering you with such questions a newbie will raise LOL.
Thanks in advance.

Answer 5 · 2021-06-23T08:48:24.000Z

Hi, Thank you for your questions.
Currently, the patch size depends upon the discriminator net settings because it takes the patch of 128x128. So, the output of SR generator net would be 128x128. If you change the discriminator's settings to take arbitrary patch size, the upscale factor can be chosen arbitrary.

For the existing settings, for upscale=5, the patch size will be 26 i.e. patch_size/upscale.

Thanks.