CARLA Simulator - Semantic Segmentation
Closed this issue · 33 comments
Hello,
Firstly, thanks for this amazing work.
Secondly, I want to use the network to train it on my own dataset from (CARLA Simulator). Are there any tips on how to adapt your implementation to my own dataset (with only 12 classes of semantics) ?
Hi! Thanks and sorry for my late reply.
I haven't tested it on CARLA but it should be fine by modifying the NUM_CLASSES variable in the main.py and the "ignoreIndex" that is passed to the iouEval. For example if your ignore class is the label "11", you should modify "iouEvalTrain = iouEval(NUM_CLASSES)" line with
iouEvalTrain = iouEval(NUM_CLASSES, 11)
and the same for the iouEvalVal line. Hope that works!
Hi @Eromera
thanks for the tips,
also I need to remove the weights which are not needed in the train
function
I tried on my dataset, and I am receiving this error
File "/erfnet_pytorch-master/train/erfnet.py", line 20, in forward
output = torch.cat([self.conv(input), self.pool(input)], 1)
RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 1. Got 170 and 171 in dimension 3 at /pytorch/aten/src/THC/generic/THCTensorMath.cu:111
And this is the part of the code which is giving the error
class DownsamplerBlock (nn.Module):
def __init__(self, ninput, noutput):
super(DownsamplerBlock, self).__init__()
self.conv = nn.Conv2d(ninput, noutput-ninput, (3, 3), stride=2, padding=1, bias=True)
self.pool = nn.MaxPool2d(2, stride=2)
self.bn = nn.BatchNorm2d(noutput, eps=1e-3)
def forward(self, input):
print([self.conv(input).size(), self.pool(input).size()])
output = torch.cat([self.conv(input), self.pool(input)], 1)
output = self.bn(output)
return F.relu(output)
The output of the print line is:
[(1, 13, 256, 341), (1, 3, 256, 341)]
[(1, 48, 128, 171), (1, 16, 128, 170)]
I see that there is a mismatch in the last dimension, 170 and 171 … but I don’t know why
Do you have any idea why can this be caused ?
I solved it by setting ceil_mode=True
for then nn.MaxPool2d
layer
Now I have another error
Traceback (most recent call last):
File "main.py", line 506, in <module>
main(parser.parse_args())
File "main.py", line 460, in main
model = train(args, model, True) #Train encoder
File "main.py", line 232, in train
loss = criterion(outputs, targets[:, 0])
File "/home/.local/lib/python2.7/site-packages/torch/nn/modules/module.py", line 491, in __call__
result = self.forward(*input, **kwargs)
File "main.py", line 82, in forward
return self.loss(torch.nn.functional.log_softmax(outputs, dim=1), targets)
File "/home/.local/lib/python2.7/site-packages/torch/nn/modules/module.py", line 491, in __call__
result = self.forward(*input, **kwargs)
File "/home/.local/lib/python2.7/site-packages/torch/nn/modules/loss.py", line 193, in forward
self.ignore_index, self.reduce)
File "/home/.local/lib/python2.7/site-packages/torch/nn/functional.py", line 1334, in nll_loss
return torch._C._nn.nll_loss2d(input, target, weight, size_average, ignore_index, reduce)
RuntimeError: input and target batch or spatial sizes don't match: target [5 x 64 x 85], input [5 x 13 x 64 x 86] at /pytorch/aten/src/THCUNN/generic/SpatialClassNLLCriterion.cu:24
I solved it by manually resizing the target
target = Resize((64, 86), Image.NEAREST)(target)
instead of
target = Resize(int(self.height/8), Image.NEAREST)(target)
And I got this error afterwards
main.py:292: UserWarning: volatile was removed and now has no effect. Use `with torch.no_grad():` instead.
inputs = Variable(images, volatile=True) #volatile flag makes it free backward or outputs for eval
main.py:293: UserWarning: volatile was removed and now has no effect. Use `with torch.no_grad():` instead.
targets = Variable(labels, volatile=True)
main.py:297: UserWarning: invalid index of a 0-dim tensor. This will be an error in PyTorch 0.5. Use tensor.item() to convert a 0-dim tensor to a Python number
epoch_loss_val.append(loss.data[0])
Exception NameError: "global name 'FileNotFoundError' is not defined" in <bound method _DataLoaderIter.__del__ of <torch.utils.data.dataloader._DataLoaderIter object at 0x7f877c13ef10>> ignored
Traceback (most recent call last):
File "main.py", line 507, in <module>
main(parser.parse_args())
File "main.py", line 461, in main
model = train(args, model, True) #Train encoder
File "main.py", line 304, in train
iouEvalVal.addBatch(outputs.max(1)[1].unsqueeze(1).data, targets.data)
File "/erfnet_pytorch-master/train/iouEval.py", line 61, in addBatch
tp = torch.sum(torch.sum(torch.sum(tpmult, dim=0, keepdim=True), dim=2, keepdim=True), dim=3, keepdim=True).squeeze()
RuntimeError: dimension out of range (expected to be in range of [-1, 0], but got 2)
Then I replaced this
tp = torch.sum(torch.sum(torch.sum(tpmult, dim=0, keepdim=True), dim=2, keepdim=True), dim=3, keepdim=True).squeeze()
fpmult = x_onehot * (1-y_onehot-ignores) #times prediction says its that class and gt says its not (subtracting cases when its ignore label!)
fp = torch.sum(torch.sum(torch.sum(fpmult, dim=0, keepdim=True), dim=2, keepdim=True), dim=3, keepdim=True).squeeze()
fnmult = (1-x_onehot) * (y_onehot) #times prediction says its not that class and gt says it is
fn = torch.sum(torch.sum(torch.sum(fnmult, dim=0, keepdim=True), dim=2, keepdim=True), dim=3, keepdim=True).squeeze()
by
tp = torch.sum(torch.sum(torch.sum(tpmult, dim=0, keepdim=True))).squeeze()
fpmult = x_onehot * (1-y_onehot-ignores) #times prediction says its that class and gt says its not (subtracting cases when its ignore label!)
fp = torch.sum(torch.sum(torch.sum(fpmult, dim=0, keepdim=True), dim=2, keepdim=True), dim=3, keepdim=True).squeeze()
fnmult = (1-x_onehot) * (y_onehot) #times prediction says its not that class and gt says it is
fn = torch.sum(torch.sum(torch.sum(fnmult, dim=0, keepdim=True))).squeeze()
I am not sure if this is right or not, but it was obvious to do so because both tensors are 1-dim tensors and my “second” torch.sum
tries to sum in dim2.
On my dataset, the IoU validation is always the same, it doesn't change
Thus, it saves the model of the first epoch as the best, then it is not changed afterwards .. Up till now, I have been going with 7 epochs and still the same
Hello,
I had a question regarding the labels you have chosen for Carla. Have you chosen the red channel image you got after PostProcessing = 'Semantic Segmentation" as your label or have you converted it into the semantically segmented image (every object has different colors) and provided that as label?
It'd be really helpful to know!
Thanks in advance!
Regards,
@SoumiDas you mean instance segmentation ?
exactly, all the labels are encoded in the red channel
Hi,
Did you encounter the error: RuntimeError: Assertion `cur_target >= 0 && cur_target < n_classes' failed. at /opt/conda/conda-bld/pytorch-cpu_1532571975038/work/aten/src/THNN/generic/SpatialClassNLLCriterion.c:110 in your execution?
Thanks!
One more question is did you give the ignore_index value as 0 since the class 0 is meant for None or void in terms of Carla data.
I didn't use any ignore_index
I got varying IoU over the training set.
what did you change in the code ? did you receive some errors that I posted up ?
if it possible, can you share with me your python files, maybe I did something wrong, so I would like to try yours
Yes I received all of them that you posted. I also considered the ignore_index part, in our part to be the 0th class since it's None. I have no other changes apart from what you had mentioned and including the ignore_index.
did you change this fp = torch.sum(torch.sum(torch.sum(fpmult, dim=0, keepdim=True), dim=2, keepdim=True), dim=3, keepdim=True).squeeze()
to one dimension ?
I did ignore_index=9
, but I received this error
fpmult = x_onehot * (1-y_onehot-ignores) #times prediction says its that class and gt says its not (subtracting cases when its ignore label!)
RuntimeError: The size of tensor a (6) must match the size of tensor b (86) at non-singleton dimension 3
did you get this before ?
No I kept it similar to the code. Also, I just tried with 2 epochs to check if it's working. Training set IoU is increasing, but validation set IoU is stable as of now. I'll run for more epochs and check.
@SoumiDas yes this happens to me, what about the above error ?
Yes you're receiving that error because of this line:
x_onehot = x_onehot[:, :self.ignoreIndex]
y_onehot = y_onehot[:, :self.ignoreIndex]
This line in code is written to consider only the labels except ignore_index (in their case, 19 - the last one). For Carla, the index to be ignored is 0 since it's None. I did
x_onehot = x_onehot[:, self.ignoreIndex+1:]
y_onehot = y_onehot[:, self.ignoreIndex+1:]
and got rid of the error I was getting:
RuntimeError: The size of tensor a (0) must match the size of tensor b (86) at non-singleton dimension 3
so when I do what you did, I will not have the error anymore ?
I did what you suggested, and it is not giving errors now
I have the first IoU on VAL set = 91.67
after the first epoch, which for sure I believe it is wrong, it is so high to be in first epoch, but I will wait to see how the others will go
I think you can just replace those lines of true positive and false positive and false negative with the original lines of the iouEval.py code. You'll probably get considerable results after that.
Also, as the warning goes, 'volatile=True' is not valid anymore and has not effect. So you can try with 'with torch.no_grad:' maybe..
@SoumiDas
you can just replace those lines of true positive and false positive and false negative with the original lines of the iouEval.py code
Sorry, i don't understand what you said, can you please tell me which lines do you mean or tell me which lines to change exactly and with what ?
Also, as the warning goes, 'volatile=True' is not valid anymore and has not effect. So you can try with 'with torch.no_grad:' maybe..
I also don't get this, where is volatile=True
or torch.no_grad
For the volatile=True
part, you mean I change
inputs = Variable(images, volatile=True) #volatile flag makes it free backward or outputs for eval
targets = Variable(labels, volatile=True)
to
with torch.no_grad():
inputs = Variable(images) #volatile flag makes it free backward or outputs for eval
targets = Variable(labels)
Yes right.
Also the outputs = model(....) line after the targets=Variable(labels) function need to be within the torch.no_grad() since that stops it from calculating gradients/backpropagation.
And what I mean by "you can just replace those lines of true positive and false positive and false negative with the original lines of the iouEval.py code" is keep these below lines as they were:
tp = torch.sum(torch.sum(torch.sum(tpmult, dim=0, keepdim=True), dim=2, keepdim=True), dim=3, keepdim=True).squeeze()
and the following lines to calculate fp, and fn.
Hope you get rid of that 90+% IoU in the first epoch after following these steps.
I think the runtimeerrorCUDNN_STATUS_INTERNAL_ERROR is normally related to mismatch between a number of classes in the labels and predictions, otherwise maybe its a wrong cudnn installation. Try manually installing it by downloading and copying it on the cuda folder.
I'm closing this issue since the conversation has been over for more than two months. If you keep having issues please reopen!