CARLA Simulator - Semantic Segmentation

Question

CARLA Simulator - Semantic Segmentation

Closed this issue 5 years ago · 33 comments

Hello,

Firstly, thanks for this amazing work.

Secondly, I want to use the network to train it on my own dataset from (CARLA Simulator). Are there any tips on how to adapt your implementation to my own dataset (with only 12 classes of semantics) ?

Answer 1 · 2018-08-24T09:33:04.000Z

Hi! Thanks and sorry for my late reply.
I haven't tested it on CARLA but it should be fine by modifying the NUM_CLASSES variable in the main.py and the "ignoreIndex" that is passed to the iouEval. For example if your ignore class is the label "11", you should modify "iouEvalTrain = iouEval(NUM_CLASSES)" line with
iouEvalTrain = iouEval(NUM_CLASSES, 11) and the same for the iouEvalVal line. Hope that works!

Answer 2 · 2018-08-30T07:55:06.000Z

Hi @Eromera
thanks for the tips,
also I need to remove the weights which are not needed in the train function

I tried on my dataset, and I am receiving this error

File "/erfnet_pytorch-master/train/erfnet.py", line 20, in forward
 output = torch.cat([self.conv(input), self.pool(input)], 1)
RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 1. Got 170 and 171 in dimension 3 at /pytorch/aten/src/THC/generic/THCTensorMath.cu:111

And this is the part of the code which is giving the error

class DownsamplerBlock (nn.Module):
    def __init__(self, ninput, noutput):
        super(DownsamplerBlock, self).__init__()

        self.conv = nn.Conv2d(ninput, noutput-ninput, (3, 3), stride=2, padding=1, bias=True)
        self.pool = nn.MaxPool2d(2, stride=2)
        self.bn = nn.BatchNorm2d(noutput, eps=1e-3)

    def forward(self, input):
        print([self.conv(input).size(), self.pool(input).size()])
        output = torch.cat([self.conv(input), self.pool(input)], 1)
        output = self.bn(output)
        return F.relu(output)

The output of the print line is:

[(1, 13, 256, 341), (1, 3, 256, 341)]
[(1, 48, 128, 171), (1, 16, 128, 170)]

I see that there is a mismatch in the last dimension, 170 and 171 … but I don’t know why

Do you have any idea why can this be caused ?

Answer 3 · 2018-08-30T11:47:03.000Z

I solved it by setting ceil_mode=True for then nn.MaxPool2d layer
Now I have another error

Traceback (most recent call last):
  File "main.py", line 506, in <module>
    main(parser.parse_args())
  File "main.py", line 460, in main
    model = train(args, model, True) #Train encoder
  File "main.py", line 232, in train
    loss = criterion(outputs, targets[:, 0])
  File "/home/.local/lib/python2.7/site-packages/torch/nn/modules/module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "main.py", line 82, in forward
    return self.loss(torch.nn.functional.log_softmax(outputs, dim=1), targets)
  File "/home/.local/lib/python2.7/site-packages/torch/nn/modules/module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/.local/lib/python2.7/site-packages/torch/nn/modules/loss.py", line 193, in forward
    self.ignore_index, self.reduce)
  File "/home/.local/lib/python2.7/site-packages/torch/nn/functional.py", line 1334, in nll_loss
    return torch._C._nn.nll_loss2d(input, target, weight, size_average, ignore_index, reduce)
RuntimeError: input and target batch or spatial sizes don't match: target [5 x 64 x 85], input [5 x 13 x 64 x 86] at /pytorch/aten/src/THCUNN/generic/SpatialClassNLLCriterion.cu:24

Answer 4 · 2018-08-30T13:36:07.000Z

I solved it by manually resizing the target
target = Resize((64, 86), Image.NEAREST)(target)
instead of
target = Resize(int(self.height/8), Image.NEAREST)(target)

And I got this error afterwards

main.py:292: UserWarning: volatile was removed and now has no effect. Use `with torch.no_grad():` instead.
  inputs = Variable(images, volatile=True)    #volatile flag makes it free backward or outputs for eval
main.py:293: UserWarning: volatile was removed and now has no effect. Use `with torch.no_grad():` instead.
  targets = Variable(labels, volatile=True)
main.py:297: UserWarning: invalid index of a 0-dim tensor. This will be an error in PyTorch 0.5. Use tensor.item() to convert a 0-dim tensor to a Python number
  epoch_loss_val.append(loss.data[0])
Exception NameError: "global name 'FileNotFoundError' is not defined" in <bound method _DataLoaderIter.__del__ of <torch.utils.data.dataloader._DataLoaderIter object at 0x7f877c13ef10>> ignored
Traceback (most recent call last):
  File "main.py", line 507, in <module>
    main(parser.parse_args())
  File "main.py", line 461, in main
    model = train(args, model, True) #Train encoder
  File "main.py", line 304, in train
    iouEvalVal.addBatch(outputs.max(1)[1].unsqueeze(1).data, targets.data)
  File "/erfnet_pytorch-master/train/iouEval.py", line 61, in addBatch
    tp = torch.sum(torch.sum(torch.sum(tpmult, dim=0, keepdim=True), dim=2, keepdim=True), dim=3, keepdim=True).squeeze()
RuntimeError: dimension out of range (expected to be in range of [-1, 0], but got 2)

Then I replaced this

tp = torch.sum(torch.sum(torch.sum(tpmult, dim=0, keepdim=True), dim=2, keepdim=True), dim=3, keepdim=True).squeeze()
fpmult = x_onehot * (1-y_onehot-ignores) #times prediction says its that class and gt says its not (subtracting cases when its ignore label!)
fp = torch.sum(torch.sum(torch.sum(fpmult, dim=0, keepdim=True), dim=2, keepdim=True), dim=3, keepdim=True).squeeze()
fnmult = (1-x_onehot) * (y_onehot) #times prediction says its not that class and gt says it is
fn = torch.sum(torch.sum(torch.sum(fnmult, dim=0, keepdim=True), dim=2, keepdim=True), dim=3, keepdim=True).squeeze()

by

tp = torch.sum(torch.sum(torch.sum(tpmult, dim=0, keepdim=True))).squeeze()
fpmult = x_onehot * (1-y_onehot-ignores) #times prediction says its that class and gt says its not (subtracting cases when its ignore label!)
fp = torch.sum(torch.sum(torch.sum(fpmult, dim=0, keepdim=True), dim=2, keepdim=True), dim=3, keepdim=True).squeeze()
fnmult = (1-x_onehot) * (y_onehot) #times prediction says its not that class and gt says it is
fn = torch.sum(torch.sum(torch.sum(fnmult, dim=0, keepdim=True))).squeeze()

I am not sure if this is right or not, but it was obvious to do so because both tensors are 1-dim tensors and my “second” torch.sum tries to sum in dim2.

Answer 5 · 2018-08-31T09:02:01.000Z

On my dataset, the IoU validation is always the same, it doesn't change
Thus, it saves the model of the first epoch as the best, then it is not changed afterwards .. Up till now, I have been going with 7 epochs and still the same

Answer 6 · 2018-08-31T09:18:26.000Z

Hello,

I had a question regarding the labels you have chosen for Carla. Have you chosen the red channel image you got after PostProcessing = 'Semantic Segmentation" as your label or have you converted it into the semantically segmented image (every object has different colors) and provided that as label?

It'd be really helpful to know!
Thanks in advance!

Regards,

Answer 7 · 2018-08-31T13:40:38.000Z

@SoumiDas you mean instance segmentation ?

Answer 8 · 2018-08-31T14:13:35.000Z

No I mean for training the semantic segmentation model, here ERFNet, what is the label you used for the RGB images obtained using CARLA simulator?

…

On Fri, Aug 31, 2018, 19:10 Mostafa Hussein ***@***.***> wrote: @SoumiDas <https://github.com/SoumiDas> you mean instance segmentation ? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#24 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AgYqCIxuBUWTIDLt-pKT_f_M4btZztzWks5uWTzYgaJpZM4VbZup> .

Answer 9 · 2018-08-31T14:14:54.000Z

@SoumiDas I used the semantic segmentation labels obtained from the SemanticSegmentation Camera of CARLA ..
but actually I have this problem of stable Val IoU, so I don't know what is the problem .. I am waiting for the answer from @Eromera

Answer 10 · 2018-08-31T14:16:37.000Z

Okay so you used the single channel i.e the red channel image here where you have the class label for each pixel. Thanks.

…

On Fri, Aug 31, 2018, 19:44 Mostafa Hussein ***@***.***> wrote: @SoumiDas <https://github.com/SoumiDas> I used the semantic segmentation labels obtained from the SemanticSegmentation Camera of CARLA .. but actually I have this problem of stable Val IoU, so I don't know what is the problem .. I am waiting for the answer from @Eromera <https://github.com/Eromera> — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#24 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AgYqCGmLenH4uE0YlNMHYXpe9bidZLiSks5uWUTfgaJpZM4VbZup> .

Answer 11 · 2018-08-31T15:18:48.000Z

exactly, all the labels are encoded in the red channel

Answer 12 · 2018-08-31T16:48:17.000Z

Hi,

Did you encounter the error: RuntimeError: Assertion `cur_target >= 0 && cur_target < n_classes' failed. at /opt/conda/conda-bld/pytorch-cpu_1532571975038/work/aten/src/THNN/generic/SpatialClassNLLCriterion.c:110 in your execution?

Thanks!

Answer 13 · 2018-08-31T17:07:51.000Z

One more question is did you give the ignore_index value as 0 since the class 0 is meant for None or void in terms of Carla data.

Answer 14 · 2018-09-03T08:18:01.000Z

I didn't use any ignore_index

Answer 15 · 2018-09-03T13:06:57.000Z

I got varying IoU over the training set.

Answer 16 · 2018-09-03T13:29:24.000Z

what did you change in the code ? did you receive some errors that I posted up ?

Answer 17 · 2018-09-03T13:30:10.000Z

if it possible, can you share with me your python files, maybe I did something wrong, so I would like to try yours

Answer 18 · 2018-09-03T13:32:20.000Z

Yes I received all of them that you posted. I also considered the ignore_index part, in our part to be the 0th class since it's None. I have no other changes apart from what you had mentioned and including the ignore_index.

Answer 19 · 2018-09-03T13:33:18.000Z

did you change this fp = torch.sum(torch.sum(torch.sum(fpmult, dim=0, keepdim=True), dim=2, keepdim=True), dim=3, keepdim=True).squeeze() to one dimension ?

Answer 20 · 2018-09-03T14:41:57.000Z

I did ignore_index=9, but I received this error

fpmult = x_onehot * (1-y_onehot-ignores) #times prediction says its that class and gt says its not (subtracting cases when its ignore label!)
RuntimeError: The size of tensor a (6) must match the size of tensor b (86) at non-singleton dimension 3

did you get this before ?

Answer 21 · 2018-09-03T14:42:29.000Z

No I kept it similar to the code. Also, I just tried with 2 epochs to check if it's working. Training set IoU is increasing, but validation set IoU is stable as of now. I'll run for more epochs and check.

Answer 22 · 2018-09-03T14:44:57.000Z

@SoumiDas yes this happens to me, what about the above error ?

Answer 23 · 2018-09-03T14:51:39.000Z

Yes you're receiving that error because of this line:
x_onehot = x_onehot[:, :self.ignoreIndex]
y_onehot = y_onehot[:, :self.ignoreIndex]

This line in code is written to consider only the labels except ignore_index (in their case, 19 - the last one). For Carla, the index to be ignored is 0 since it's None. I did

x_onehot = x_onehot[:, self.ignoreIndex+1:]
y_onehot = y_onehot[:, self.ignoreIndex+1:]

and got rid of the error I was getting:

RuntimeError: The size of tensor a (0) must match the size of tensor b (86) at non-singleton dimension 3

Answer 24 · 2018-09-03T14:53:58.000Z

so when I do what you did, I will not have the error anymore ?

Answer 25 · 2018-09-03T15:27:25.000Z

I did what you suggested, and it is not giving errors now
I have the first IoU on VAL set = 91.67 after the first epoch, which for sure I believe it is wrong, it is so high to be in first epoch, but I will wait to see how the others will go

Answer 26 · 2018-09-03T15:33:40.000Z

I think you can just replace those lines of true positive and false positive and false negative with the original lines of the iouEval.py code. You'll probably get considerable results after that.

Also, as the warning goes, 'volatile=True' is not valid anymore and has not effect. So you can try with 'with torch.no_grad:' maybe..

Answer 27 · 2018-09-04T05:30:55.000Z

@SoumiDas
you can just replace those lines of true positive and false positive and false negative with the original lines of the iouEval.py code
Sorry, i don't understand what you said, can you please tell me which lines do you mean or tell me which lines to change exactly and with what ?

Also, as the warning goes, 'volatile=True' is not valid anymore and has not effect. So you can try with 'with torch.no_grad:' maybe..
I also don't get this, where is volatile=True or torch.no_grad

Answer 28 · 2018-09-04T14:07:43.000Z

For the volatile=True part, you mean I change

inputs = Variable(images, volatile=True)    #volatile flag makes it free backward or outputs for eval
targets = Variable(labels, volatile=True)

to

with torch.no_grad():
            inputs = Variable(images)    #volatile flag makes it free backward or outputs for eval
            targets = Variable(labels)

Answer 29 · 2018-09-05T07:09:31.000Z

Yes right.

Also the outputs = model(....) line after the targets=Variable(labels) function need to be within the torch.no_grad() since that stops it from calculating gradients/backpropagation.

And what I mean by "you can just replace those lines of true positive and false positive and false negative with the original lines of the iouEval.py code" is keep these below lines as they were:

tp = torch.sum(torch.sum(torch.sum(tpmult, dim=0, keepdim=True), dim=2, keepdim=True), dim=3, keepdim=True).squeeze() and the following lines to calculate fp, and fn.

Hope you get rid of that 90+% IoU in the first epoch after following these steps.

Answer 30 · 2018-09-05T07:30:51.000Z

@SoumiDas
thanks a lot for your support

so this means that for tp, fp, fn, keep the original calculation with three dimensions, right ? do not change it as I did above

Answer 31 · 2018-09-05T08:09:19.000Z

Exactly you're right. *~Soumi Das*

…

On Wed, Sep 5, 2018 at 1:00 PM Mostafa Hussein ***@***.***> wrote: @SoumiDas <https://github.com/SoumiDas> thanks a lot for your support so this means that for tp, fp, fn, keep the original calculation with three dimensions, right ? do not change it as I did above <#24 (comment)> — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#24 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AgYqCKc-hUWetK5hlBbluufFzSxm8zurks5uX32wgaJpZM4VbZup> .

Answer 32 · 2018-10-04T15:12:02.000Z

when I train the decoder, it always a runtimeerrorCUDNN_STATUS_INTERNAL_ERROR', can you give me some advice to handle it?

Answer 33 · 2018-12-29T10:54:00.000Z

I think the runtimeerrorCUDNN_STATUS_INTERNAL_ERROR is normally related to mismatch between a number of classes in the labels and predictions, otherwise maybe its a wrong cudnn installation. Try manually installing it by downloading and copying it on the cuda folder.

I'm closing this issue since the conversation has been over for more than two months. If you keep having issues please reopen!