drprojects/DeepViewAgg

ImageBatch of the S3DIS dataset

ruomingzhai opened this issue · 15 comments

Hi, Dr. Robert.
I met a problem when I customed my model based on your DeepViewAgg project again with the S3DIS dataset.
I add a customed classification label for each pixel in images, i.e., extend the ImageMapping class with added values[3] in ImageMapping.values, as follows:
image

But when I trained the model with my customed dataset class, the ImageBatch (derived from SameSettingImageBatch? )present different downscale values such as 1, 0.5, and 0.25. Why it shows different scales? I thought they should be the same scales for all images.

The program shows fine with downscale=1.0 but breaks down with downscale=0.5 or 0.25.
I trace the code to the upscale_images() function in torch_point3d/core/multimodal/image.py and it seems you recalculate the pixel coords and update them in value[1].values[0].
image
The pixel coords indexes exceed the output of the pretrained image model, which is fixed to [batch, C_dim, 1024, 512].

Hope to get some help from you.
Best regards,

Hi @ruomingzhai

With the information you provided, I cannot tell what your problem is. Can you please provide some code snippets of:

  • the modifications you made to ImageMapping
  • the command you run and the full error message traceback you get

A remark though: if you try to extend the ImageMapping with some additional pixel-level information, you should not append stuff in ImageMapping.values (which carry image-level info), but in ImageMapping.values[1].values (which carry pixel-level stuff).

Thank you for your attention!
First of all, the modification I made to ImageMapping:

  • I add the pixel-level information in the process function of the MapImages class as follows:

1667898497875

the image_sp_ids attribute is a tensor having the same number of columns as the pixels attribute.
in the from_dense function, I compress the image_sp_ids with other attributes as follows:

1667898859106

So, you mean I should add my image_sp_ids attribute to the image_ids which referring to ImageMapping.values[1] ?

1667899304640

  • Secondly, I use the ADE20KResNet18_C1_deepsup model as image pretrained model and confine the output tensor to [i,c,516,1024] as follows:

1667899304641

  • Third, I add a function get_mapped_features_sp and attribute () to retrieve my image_sp_idx during training
  def get_mapped_features_sp(self, interpolate=False):
 
        scale = 1 / self.downscale

        # If not interpolating, set the mapping to the proper scale

        mappings = self.mappings if interpolate \
            else self.mappings.rescale_images(scale)
        # if not interpolate:
        #     self.mappings.rescale_images(scale)
        # Index the features with/without interpolation
        if interpolate and scale != 1:
            resolution = torch.Tensor([self.mapping_size]).to(self.device)
            coords = mappings.pixels / (resolution - 1)
            coords = coords[:, [1, 0]]  # pixel mappings are in (W, H) format
            batch = mappings.sp_map_indexing[0]
            x = sparse_interpolation(self.x, coords, batch)
        else:
            mod_idx,sp_sorting_idx,sp_csr,sp_im_csr = mappings.sp_map_indexing
            x = self.x[mod_idx] #[num_views,512,heigh,wide]
            self.mappings.sp_csr_idx = sp_csr 
            self.mappings.sp_sorting_idx = sp_sorting_idx
            self.mappings.sp_im_csr =sp_im_csr
        return x

   def sp_map_indexing(self):
        #image_ids
        idx_batch = self.images.repeat_interleave(
            self.values[1].pointers[1:] - self.values[1].pointers[:-1])
        sp_sorting_idx = lexargsort(idx_batch,self.values[3]) #[N_i,1]
        #pixel indexes
        idx_batch = idx_batch[sp_sorting_idx]
        idx_height = self.pixels[sp_sorting_idx, 1]
        idx_width = self.pixels[sp_sorting_idx, 0]
        idx = (idx_batch.long(), ..., idx_height.long(), idx_width.long())
        #self---ImageMapping.sp_sorting_idx
        # self._sp_sorting_idx = sp_sorting_idx
        #SP_CSR:self----ImageMapping.sp_csr_idx
        sp_pointer=self.values[3][sp_sorting_idx]
        sp_idx = torch.cat([torch.LongTensor([0]).to(self.device),
                            torch.where(sp_pointer[1:]!=sp_pointer[:-1])[0]+1])
        sp_im_idx = torch.cat([torch.LongTensor([0]).to(self.device),
                                torch.where(idx_batch[sp_idx[1:]]!=idx_batch[sp_idx[:-1]])[0]+1,
                                torch.LongTensor([sp_idx.shape[0]]).to(self.device)]) #
        sp_csr_idx = torch.cat([sp_idx, torch.LongTensor([sp_pointer.shape[0]]).to(self.device)])
        # self._sp_csr_idx = sp_csr_idx
        sp_3d_idx=self.points.repeat_interleave(self.pointers[1:]-self.pointers[:-1])
        return idx,sp_3d_idx,sp_csr_idx,sp_im_idx

The error happens at:

x = self.x[mod_idx] #[num_views,512,heigh,wide]

as follows:

] && "index out of bounds" failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:84: operator(): block: [41,0,0], thread: [87,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:84: operator(): block: [41,0,0], thread: [88,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:84: operator(): block: [41,0,0], thread: [89,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:84: operator(): block: [41,0,0], thread: [90,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds"  failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:84: operator(): block: [41,0,0], thread: [91,0,0] Assertion  index >= -sizes[i] && index < sizes[i] && "index out of bounds"  failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:84: operator(): block: [41,0,0], thread: [92,0,0] Assertion  index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:84: operator(): block: [41,0,0], thread: [93,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:84: operator(): block: [41,0,0], thread: [94,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:84: operator(): block: [41,0,0], thread: [95,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
  0%|                                                                                                                      | 0/2000 [01:04<?, ?it/s]
Traceback (most recent call last):
  File "s3dis_preprocess.py", line 120, in <module>
    initial_trainer()
  File "s3dis_preprocess.py", line 95, in initial_trainer
    trainer.train()
  File "/root/share/code/DeepViewAgg/torch_points3d/trainer.py", line 147, in train
    self._train_epoch(epoch)
  File "/root/share/code/DeepViewAgg/torch_points3d/trainer.py", line 202, in _train_epoch
    self._model.optimize_parameters(epoch, self._dataset.batch_size)
  File "/root/share/code/DeepViewAgg/torch_points3d/models/base_model.py", line 245, in optimize_parameters
    self.forward(epoch=epoch)  # first call forward to calculate intermediate results
  File "/root/share/code/DeepViewAgg/torch_points3d/models/segmentation/multimodal/sparseconv3d.py", line 299, in forward
    mm_data_dict = self.backbone_no3d(self.input)
  File "/root/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/root/share/code/DeepViewAgg/torch_points3d/applications/multimodal/no3d.py", line 116, in forward
    mm_data_dict = self.down_modules[i](mm_data_dict)
  File "/root/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/root/share/code/DeepViewAgg/torch_points3d/modules/multimodal/modules.py", line 97, in forward
    mm_data_dict = mod_branch(mm_data_dict, m)
  File "/root/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/root/share/code/DeepViewAgg/torch_points3d/modules/multimodal/modules.py", line 485, in forward
    x_mod = mod_data.get_mapped_features_sp(interpolate=self.interpolate)
  File "/root/share/code/DeepViewAgg/torch_points3d/core/multimodal/image.py", line 1579, in get_mapped_features_sp
    return [im.get_mapped_features_sp(interpolate=interpolate) for im in self]
  File "/root/share/code/DeepViewAgg/torch_points3d/core/multimodal/image.py", line 1579, in <listcomp>
    return [im.get_mapped_features_sp(interpolate=interpolate) for im in self]
  File "/root/share/code/DeepViewAgg/torch_points3d/core/multimodal/image.py", line 1327, in get_mapped_features_sp
    x = self.x[mod_idx] #[num_views,512,heigh,wide]
RuntimeError: CUDA error: device-side assert triggered

It seems the mod_idx exceeds the actual index in image output size. So I found the image pixel indexes changes when the scale value is not 1.0 in this code :

        mappings = self.mappings if interpolate \
            else self.mappings.rescale_images(scale)

1667898497876

So, I thought it should be a problem with the scale value, not my adding codes.
Hope I explain my problem clearly!

I took the liberty to edit your previous message, for readability.

I can't tell for sure what is happening here, some more explanations on what your modifications are for might be helpful:

  • the type and shape of your image_sp_ids, what it represents, etc.
  • how do you use ADE20KResNet18_C1_deepsup as an image model ? Which model config are you using ? Are you sure you can fit a whole image encoder-decoder in your GPU memory ?

If I had to guess what you are trying to do, I would say image_sp_ids holds indices of superpixels in which the mapping falls. Still guessing, this means your image_sp_ids should be a 1D LongTensor of shape [ImageMapping.pixels.shape[0]] (ie it has the same number of rows as the pixel attribute, not the same number of columns as your previously mentionned), is that correct ? When debugging, you can use ImageMapping.debug() to check that there are no errors in how you constructed your mappings. This will not investigate all possible errors, but can rule out simple ones.

You seem to have replaced ImageMapping.get_mapped_features with your ImageMapping.get_mapped_features_sp. From what I see, the traceback seems to indicate the error comes from there, I don't think x = self.x[mod_idx] is the problem.

Make sure you set CUDA_LAUNCH_BLOCKING=1 when debugging CUDA-related errors. Otherwise you won't get trustworthy error logs:

CUDA_LAUNCH_BLOCKING=1 python train.py <your kwargs here>

This might give you a more specific error message.

Overall, I am guessing your error is related to the scale ratio between the input image and the output image feature map. ImageMapping.get_mapped_features should normally be able to tell how you rescaled your image and return a big [num_mappings, num_features] Tensor of the 2D features associated with the mappings. One current limitation of this code is that it only supports rescaling image width and size separately (ie the width / height aspect ratio must always stay constant). So make sure your input and output image feature maps have the same aspect ratio. Have a look here to see where the downscale attribute is updated based on the input/output image feature maps size ratio.

Thanks!
Your are right. The image_sp_ids is 1D tensor with the same rows as pixel attribute.
I want better image mapped features from a more complete encode-decode image model and it it quite a heavy load for a 16G GPU because it always breaks down. So I change the resolution_3d to 0.3m just for working the codes out first. I am not sure its a good idea to ptrain this large 2D model right now.

The ADE20KResNet18_C1_deepsup model is shown as follows:

class ADE20KResNet18_C1_deepsup(nn.Module):
    """ResNet-18 encoder pretrained on ADE20K with C1_deepsup.

        Adapted from https://github.com/CSAILVision/semantic-segmentation-pytorch
        """
    def __init__(self, *args, frozen=False, pretrained=True, **kwargs):
        super().__init__()

        # Adapt the default config to use ResNet18 + PPM-Deepsup model
        ARCH = 'resnet18dilated-c1_deepsup'
        DIR = osp.join(PRETRAINED_DIR, 'ade20k', ARCH)
        MITCfg.merge_from_file(osp.join(DIR, f'{ARCH}.yaml'))
        MITCfg.MODEL.arch_encoder = MITCfg.MODEL.arch_encoder.lower()
        MITCfg.MODEL.arch_decoder = MITCfg.MODEL.arch_decoder.lower()
        MITCfg.DIR = DIR

        # Absolute paths of model weights
        MITCfg.MODEL.weights_encoder = osp.join(
            MITCfg.DIR, 'encoder_' + MITCfg.TEST.checkpoint)
        MITCfg.MODEL.weights_decoder = osp.join(
            MITCfg.DIR, 'decoder_' + MITCfg.TEST.checkpoint)

        assert osp.exists(MITCfg.MODEL.weights_encoder) and \
               osp.exists(MITCfg.MODEL.weights_decoder), \
            "checkpoint does not exist!"
        decoder_pretrianed = kwargs.get('decoder_pretrained',False)
        # Build encoder and decoder from pretrained weights
        old_stdout = sys.stdout  # backup current stdout
        sys.stdout = open(os.devnull, "w")
        self.pretrained = pretrained
        self.encoder = MITModelBuilder.build_encoder(
            arch=MITCfg.MODEL.arch_encoder,
            fc_dim=MITCfg.MODEL.fc_dim,
            weights=MITCfg.MODEL.weights_encoder if pretrained else '')
        self.decoder_pretrianed = decoder_pretrianed
        if self.decoder_pretrianed :
            self.decoder = MITModelBuilder.build_decoder(
            arch=MITCfg.MODEL.arch_decoder,
            fc_dim=MITCfg.MODEL.fc_dim,
            num_class=MITCfg.DATASET.num_class,
            weights=MITCfg.MODEL.weights_decoder if pretrained else '',
            use_softmax=True)
        else:
            self.decoder = nn.Sequential(
                nn.Conv2d(512, 256, 1),
            nn.Upsample(scale_factor=8, mode="bilinear", align_corners=True),)
        sys.stdout = old_stdout  # reset old stdout

        # Convert PPM from a classifier into a feature map extractor
        # self.decoder = PPMFeatMap.from_pretrained(self.decoder)

        # If the model is frozen, it will always remain in eval mode
        # and the parameters will have requires_grad=False
        self.frozen = frozen
        if self.frozen:
            self.training = False
        self.normalize_feature=True
    def forward(self, x, *args, out_size=None, **kwargs):
        enc = self.encoder(x, return_feature_maps=True)
        if self.decoder_pretrianed :
            pred = self.decoder(enc,segSize=[512,1024]) #[240,320]
        else:
            pred = self.decoder(enc[3])
        if self.normalize_feature:
            pred = F.normalize(pred, p=2, dim=1)
        return pred

About the error, I just found I use the get_mapped_feature followed with my customed get_mapped_feature_sp in the 2D forward function:

           mod_data = self.forward_conv(mod_data)
        
            if self.superpixel_pool is not None:
                x_mod = mod_data.get_mapped_features(interpolate=self.interpolate) #
                x_mod = self.forward_superpixel_pool(x_3d, x_mod, mod_data) #attention约束的2Dfeature

            x_mod = mod_data.get_mapped_features_sp(interpolate=self.interpolate)
            #2D像素的特征进行sp聚集reduction=mean的特征值
            sp_csr =mod_data.get_sp_csr_idx()
            sp_sorting = mod_data.get_sp_sorting_idx()
            sp_im_csr = mod_data.get_sp_im_csr()
            # idx =(sp_sorting[0],...)#对3D点云的特征进行sp聚集reduction=mean的特征值
            x_mod = self.forward_atomic_pool(x_3d, x_mod, sp_csr )

Maybe implementing this similar get_mapped_features functions is the reason why the downscale change. I will investigate it these day.

I use your suggested ImageMapping.debug(), and its error message shows :
self.debug() File "/root/share/code/DeepViewAgg/torch_points3d/core/multimodal/image.py", line 344, in debug assert self._downscale >= 1, \ AssertionError: Expected scalar larger than 1 but got 0.25 instead
I don't know why this happen.

I must warn you that the 2D model is the part of the architecture that is the most memory-intensive. Much more than the 3D encoder-decoder. In this regard, using a 2D encoder will be especially costly, because you want to produce features maps at a large resolution, with a lot of channels. We address this issue in our paper and in the code, by showing that using a full image decoder is overkill:

  • Because the 2D-3D mappings are inherently sparse in datasets like S3DIS and KITTI-360, we only pass a fraction of the full-resolution output feature maps to the 3D points. So computing a full-resolution output feature map is memory-inefficient
  • Instead, we proposed a pyramid interpolation scheme to extract multi-scale features from the 2D encoder features maps much like a FPN would do, but without having to compute the whole 2D decoder feature maps. See the Res16UNet34-PointPyramid-early-cityscapes-interpolate model for instance. This relies on bilinear interpolation of the low resolution features maps only at the desired mapping pixel coordinates
  • This pyramid interpolation scheme revealed useful for the KITTI-360 dataset, but not essential for S3DIS, becasue the pretrained 2D ResNet-18 encoder we use for S3DIS outputs relatively "high-resolution" features at its lower stage.

If you still want to use a full 2D encoder-decoder, it is possible. I have already done so with the ADE20KResNet18PPM encoder-decoder. You can have a look at the XYZ-RGB-PPM-late model config here for pointers. Besides, here are some insights about using an image decoder:

  • reducing you 2D resolution will save you much more memory than reducing you 3D resolution, and might better preserve your final 3D semantic segmentation performances.
  • you could consider freezing you 2D model to save memory and compute at train time. This means you will not fine-tune your 2D model
  • if fine-tuning the 2D model is important to you (it does improve performance in my experience), you can use checkpointing (see the checkpointing parameter for UnimodalBranch). This allows you to fit larger models in memory when training, but at the cost of longer training time.

As for your current error, please make sure you set CUDA_LAUNCH_BLOCKING=1 when investigating erros happening on the GPU. Your previously-posted error traceback message does not look complete to me and does not indicate where the error actually comes from.

Your above error indicates that you have self.downscale set to a value smaller than 1, which is not acceptable. This might happen if your output 2D feature map resolution is larger than your input image resolution, or if you manually modified self.downscale in an unexpected way somewhere. As suggested yesterday, have a look here to see where the downscale attribute is updated based on the input/output image feature maps size ratio.

sorry to mention I use the CUDA_LAUNCH_BLOCKING=1 to run the program from the first time. The error comes from the wrong scale I think. So I want to ask you about that is it possible the scale cloud be 0.25?

Did you set the scale yourself manually somewhere ? What are the shapes of your input images and output feature maps ? Can you share which model config your are using ? In the piece of code you showed, your scale should normally be larger than 1. If it is smaller, it means your output feature maps have a higher resolution than your input image (unlikely).

I only set the image size in the image decoder to pred = self.decoder(enc,segSize=[512,1024]) ,but I don’t think it is the problem. I will debug the code to find where the scale changs. Thank you for your tips!!!

def forward(self, x, *args, out_size=None, **kwargs):
        
        enc = self.encoder(x, return_feature_maps=True)
        if self.decoder_pretrianed :
            pred = self.decoder(enc,segSize=[512,1024]) 
        else:
            pred = self.decoder(enc[3])
        if self.normalize_feature:
            pred = F.normalize(pred, p=2, dim=1)
        return pred

Hi, sorry to bother and my problem has not been solved.
I followed your suggestions and traced bugs to the ImageMapping.debug(). It shows each SameSettingImageBatch has a different img_size as shown follows:
cbfcb6c97ca319cc6c8b17cf85b206d
I think this is the reason why the downscale change while preparing the imagebatch before training. It did not happen to my ScanNet dataset and I guess it derives from the data_transform "CropImageGroups". Does it happen to you? I can't contraol its cropping operation. Hope to get help from you.
Looking forward to hearing your suggestions.

I trace back to the data_transform in CropImageGroup and I think it may be the reason why ImageData.img_size change and result in the changing downscale. I add codes in CropImageGroup as follows:

1668504192241

And the result is :

image original_size :(512, 256) cropping_size:(128, 64) scale :0.25
image original_size :(512, 256) cropping_size:(128, 128) scale :0.5
image original_size :(512, 256) cropping_size:(128, 128) scale :0.5
image original_size :(512, 256) cropping_size:(256, 256) scale :1.0
image original_size :(512, 256) cropping_size:(128, 64) scale :0.25
image original_size :(512, 256) cropping_size:(128, 128) scale :0.5
image original_size :(512, 256) cropping_size:(256, 128) scale :0.5
image original_size :(512, 256) cropping_size:(256, 256) scale :1.0
image original_size :(512, 256) cropping_size:(128, 128) scale :0.5
image original_size :(512, 256) cropping_size:(256, 256) scale :1.0
image original_size :(512, 256) cropping_size:(128, 128) scale :0.5
image original_size :(512, 256) cropping_size:(256, 256) scale :1.0

So does it because my resolution_3d=0.3 leads to different image sizes?

Hope to hear your suggestions!!!! It bothers me a lot!
Now, I may try to preprocess the point cloud with resolution_3d=0.02(default value) and check if the same bug happens to it.

If you look at the conf/data/segmentation/multimodal/s3disfused-sparse.yaml here is what each transform:

  • SelectMappingFromPointId: reads the mapping_index key in laoded 3D points Data object and loads the corresponding images and mappings
  • CenterRoll: rotate the image sphere around the Z axis to place the mapped pixels in the center of the image (for S3DIS equirectangular images only). This prepares the next steps PickImagesFromMappingArea and CropImageGroups
  • PickImagesFromMappingArea: drop images which do not contain enough mapped pixels
  • CropImageGroups: crop each image with a bounding box including all its mapped pixels. This produces images of various sizes, depending on how far the image is located from the 3D points. But since we prefer processing images in same-size batches, the bounding boxes are adjusted to the nearest box among a family of boxes (eg [64, 64], [128, 64], [128, 128], [256, 128], [256, 256], [512, 256], [512, 512], ...). The output of this transform is an ImageData and not a SameSettingImageData. An ImageData is a holder class to carry a list of SameSettingImageData with different sizes, precisely what CropImageGroups outputs.
  • PickImagesFromMemoryCredit: select a subset of the images based on their size and how informative their mappings are. This transform ensures that the number of images in the batch does not bypass a chosen "pixel credit". This is an augmentation and caps the GPU memory usage.
  • JitterMappingFeatures: add some noise to the mapping features
  • ColorJitter: add some noise to the image radiometry
  • RandomHorizontalFlip: random horizontally flip images (and mappings)
  • Normalize: normalize image radiometry

So it is perfectly normal for CropImageGroups to alter image size, it is what it is designed for.

I think the error may come from the fact that you hardcoded an output feature map size in your image decoder:

self.decoder(enc,segSize=[512,1024])

If this does what I think it does, it means your decoder does not respect the cropping sizes produced by CropImageGroups and it outputs feature maps which are even larger than some of your input feature maps (which is unlikely, unless you work on super-resolution). If that is the case, I suggest you make sure the output of your image model is the same size as your input, or smaller with the same scaling applied on height and width (for now, the code does not support aspect ratio changes).

PS: the 3D point cloud resolution should not have anything to do with this, you can change it however you want.

Hi, Dr. Robert.
The bug indeed derives from the fixed image size in the self.decoder(enc,segSize=[512,1024]) !
I can train the model right now but it randomly breaks down with the resnet18dilated-c1_deepsup encoder:

File "/root/DeepViewAgg/torch_points3d/modules/multimodal/modalities/image.py", line 779, in forward
  enc = self.encoder(x)

File "/root/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
  result = self.forward(*input, **kwargs)

File "/root/.local/lib/python3.8/site-packages/mit_semseg/models/models.py", line 259, in forward
  x = self.maxpool(x)

File "/root/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
  result = self.forward(*input, **kwargs)

File "/root/.local/lib/python3.8/site-packages/torch/nn/modules/pooling.py", line 153, in forward
  return F.max_pool2d(input, self.kernel_size, self.stride,

File "/root/.local/lib/python3.8/site-packages/torch/_jit_internal.py", line 267, in fn
  return if_false(*args, **kwargs)

File "/root/.local/lib/python3.8/site-packages/torch/nn/functional.py", line 585, in _max_pool2d
  return torch.max_pool2d(

RuntimeError: non-empty 3D or 4D input tensor expected but got ndim: 4

Hi @ruomingzhai

So it seems the reason for your initial problem was that your modifications broke the way the code makes use of images batched together at various sizes, following CropImageGroups.

The initial problem for this issue has been solved, so I will close it.

Your latest comment is clearly about a different problem and concerns an input or output feature shape in a 2D model resnet18dilated-c1_deepsup I did not build.

I am really glad 😊 you find this project interesting and are trying to make use of it. However, there are limits to how much time I can dedicate for support here:

  • ✔️ I can provide help to clarify how things I built work and make some suggestions on how to extend the current codebase.

  • ❌ I cannot provide support for things I did not build myself and that break the way the project works.

👉 I would recommend to make sure you understand the paper and read the code and comments better. Typically, this means understanding how the provided model works, the size of the inputs and outputs in each block of the model.

Under these circumstances, if you are running into a new challenge, which you cannot solve after having thoroughly investigated it yourself, feel free to open an issue with detailed information on the modifications you made and the error logs.

Thanks in advance and good luck 🍀 !