j96w/DenseFusion

Sementation on LineMOD dataset

Closed this issue · 18 comments

I am trying to train segmentation network on linemod dataset. I have got follwoing error..
I have changed the path (in train.py and data_controller.py) necessary to import linemod rgb and mask files, as well as final class indices as 14 instead of 22 wherever needed in loss.py script and segnet.py.

Here is the error:

5000 1000
/usr/local/lib/python3.5/dist-packages/torch/nn/_reduction.py:49: UserWarning: size_average and reduce args will be deprecated, please use reduction='mean' instead.
warnings.warn(warning.format(ret))
2019-08-08 14:20:55,080 : Train time 00h 00m 00s, Training started
Traceback (most recent call last):
File "train.py", line 74, in
semantic_loss = criterion(semantic, target)
File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/ros/Object_Pose_Estimation/DenseFusion/vanilla_segmentation/loss.py", line 35, in forward
return loss_calculation(semantic, target)
File "/home/ros/Object_Pose_Estimation/DenseFusion/vanilla_segmentation/loss.py", line 24, in loss_calculation
semantic_loss = CEloss(semantic, target)
File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/loss.py", line 904, in forward
ignore_index=self.ignore_index, reduction=self.reduction)
File "/usr/local/lib/python3.5/dist-packages/torch/nn/functional.py", line 1970, in cross_entropy
return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)
File "/usr/local/lib/python3.5/dist-packages/torch/nn/functional.py", line 1788, in nll_loss
.format(input.size(0), target.size(0)))
ValueError: Expected input batch_size (921600) to match target batch_size (2764800).

hygxy commented

@sanjaysswami same issue here, i made my own dataset which consists of only 4 classes, I got the similar problem except the last line :

  • ValueError: Expected input batch_size (921600) to match target batch_size (3686400).

@hygxy if you find any solution please let me know. Thank you in advance

hygxy commented

@sanjaysswami Adding convert("L") to label might help, i.e: change this line to the following:

  • label = np.array(Image.open('{0}/{1}-label.png'.format(self.root, self.path[index])).convert("L"))

But after that i got another problem:

/opt/conda/conda-bld/pytorch_1550796191843/work/aten/src/THCUNN/ClassNLLCriterion.cu:105: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [31,0,0] Assertion t >= 0 && t < n_classes failed.
RuntimeError: CUDA error: device-side assert triggered

@hygxy tried as you did. Got same error. Trying to resolve it now.

@hygxy After that step you need to normalize the label with range of 0-1 pixels. For me now its working

hygxy commented

@sanjaysswami Could you please also show me the normalization code?

Using this formula: (x - x.min()) / (x.max() - x.min()) # values from 0 to 1
with x is the label

hygxy commented

@TrinhTUHH, thanks for your advice, it's working on my own dataset now.
@sanjaysswami I am wondering why the author didn't do the normalization step and the code works too

when you convert label image to grayscale, the maximum value is 255 (for ycb it is 21), so my guess is that through the convolution layers, the value reach too big that cuda cannot handle.

hygxy commented

@TrinhTUHH , if that's the reason, why do we need to normalize all pixels to (0,1) instead of to (0,13) in for example @sanjaysswami 's case?

Hi, sorry, I think my suggestion is wrong because I didn't understand the label image the right way.. After reading this issue, I think the normalization should not be the case here. So in a mask (or label) image, the pixels in background will be marked as one value (in ycb they are 0), the pixels belong to one class are marked with number the same as the order or rank of the object (for example: 8 is the gelatin box, 21 is foam_brick, 14 is mug, etc.). In Linemod, all mask images except the images in the folder 02, have pixels of 2 values 0 and 255, (0 is black for background and 255 as white for object), meaning only one object is segmented in one frame. But the mask images in folder 02, all objects are segmented and labeled. However there are 22 labeled values which exceed number of objects in Linemod (more details in this issue). And that might be the real problem causing this error.

Anyway, I can only check about it tomorrow.

hygxy commented

@TrinhTUHH , we have two problems here:

  • whenever the mask/label is not a real grayscale image(those in linemod, they look like grayscale, but actually are not, you can check that by "file xxxx.png", it shows "PNG image data, 640 x 480, 8-bit/color RGB, non-interlaced". In this case we will get "Expected input batch_size (921600) to match target batch_size (2764800)" error, that's why we need convert("L"), but this causes another error as described in the following.

  • whenever the pixel values of converted grayscale mask/label exceed the number of classes, we will get "device-side assert triggered" error, that's why we need normalization.

That's basically my understanding, I just can't figure that why the creator of this issue did a (0,1) normalization instead of a (0, number of classes) normalization, the latter works also. @sanjaysswami Any explanations?

@hygxy I and @TrinhTUHH works together. Today we will check and get back to you.

@hygxy Normalization should not be used here. You can simply replace the pixels which have value of 255 by the object_id in all label varibales in data_controller.py. After that train the network.

hygxy commented

@ But after convert("L") I have no pixel values 255 any more. Could you please also given me an example?

For me the white region of label image still has pixels of 255 after convert('L'). If yours does not have, then just replace the maximum values by object id and keep the pixels in background 0.

hygxy commented

I see, so actually we are normalizing the pixel values to (0, number of classes) as I mentioned before?

Hi, I try to train the SegNet with my own synthetic dataset in the structure of the preporcessed LINEMOD dataset. Maybe my question is a bit stupid, but how does it work? I want the Segnet to be trained so it gets me the output of only the object I am currently searching in the picture for (Like the mask images in the preprocessed LINEMOD dataset). But if I understand it right The SegNet would output the mask for every known object in the picture, right?