e-lab/ENet-training

Ground truth pixel values in CamVid

Closed this issue · 1 comments

Hi,

I use CamVid dataset to train a model, and its parameter setting is same as the ENet paper.
After training, I forward a single image with my trained model, and the results are:
0001tp_006690
0001tp_006690_out_vec_image

From this single image test, it seems that the trained model is powerful enough.

With the image above, and its annotated image, I try to calculate its accuracy.
I considered the pixel values in annotated image indicate its class.
For example, 0 indicates background, 1 indicates sky, and so on.
After forwarding the image, I compare each pixel in output vector with its ground truth,
and get the accuracy 0.06 = 6%.
0001tp_006690

After that, I saw the following code in loadCamVid.lua
-- load corresponding ground truth
rawImg = image.load(gtPath[i], 1, 'byte'):squeeze():float() + 2
local mask = rawImg:eq(13):float()
rawImg = rawImg - mask * #classes

In original ground truth image, the pixel values are 0-11,
after the process above, the pixel values are 1-11, the index 0 is lost.

What's going wrong when I calculate the accuracy?
Many thanks.

After the code above, the range should be 1-12,
and it can be adjusted to 0-11 for classification with 0-index-start labels.

Closing the issue.