XiandaGuo/OpenStereo

FADNet tensor size mismatched

Closed this issue · 1 comments

Hi!

I attempted to validate the ETH3D dataset on FADNet using the pretrained weight: FADNet_sceneflow.pt. However, I encountered a dimension mismatch issue due to the image size of ETH3D being (513, 888). The problem arises specifically in the line concat5 = torch.cat((upconv5, upflow6, conv5b), 1), where the sizes of the three elements are as follows:

upconv5: torch.Size([1, 512, 18, 28])
upflow6: torch.Size([1, 1, 18, 28])
conv5b: torch.Size([1, 512, 17, 28])

Is additional padding or resizing required here or do you have any suggestions on how to address this, for those who would like to inference any other dataset instead of SceneFlow (like ETH3D, Kitti2012, etc). Thanks!

For ETH3D, you can use following transform setting:

    val:
      - type: StereoPad
        size: [ 512, 960 ]
      - type: GetValidDispa
        max_disp: 192
      - type: TransposeImage
      - type: ToTensor
      - type: NormalizeImage
        mean: [ 0.485, 0.456, 0.406 ]
        std: [ 0.229, 0.224, 0.225 ]

For most models, the input image size needs to be a multiple of 64 or 32. So generally speaking, using StereoPad and setting a reasonable size will do the trick.