The max value in the original ground truth is 3 instead of 1.

Question

The max value in the original ground truth is 3 instead of 1.

haranrk opened this issue 3 years ago · 6 comments

Dear Sir, I have a quick question, I calculated the Jaccard index between the original ground truth mask (I read the mask image at level 0 using ReadWholeSlideImage function) and the generated mask after thresholding (prd_im_fll_dict) but I got low values, therefore I reinvestigated the min and max values of each of them and I am surprised that the max value in the original ground truth is 3 instead of 1. Can you explain why?

Originally posted by @codeskings in #19 (comment)

Answer 1 · 2021-04-11T13:36:51.000Z

@codeskings
Can you share which image has the min and max as 0 and 1?

Answer 2 · 2021-04-11T13:58:31.000Z

I am testing with the first training image with id (Training_phase_1_001). I was surprised by the original mask of the image, it is supposed to be binary image with black and white pixels, i.e., it should have values 0 and 1, then why we got 3? the predicted generated mask has max value of 1.

Answer 3 · 2021-04-11T14:04:27.000Z

Can you check the histogram (numpy has a histogram function) and see how many pixels are of the value 3? It's most likely a conversion error.

Answer 4 · 2021-04-11T14:52:14.000Z

I used the following method to read the binary mask of the training image at level 0:
mask_obj, mask_data, level = ReadWholeSlideImage(mask_path,0,False)

then using the histogram of the mask_data, I got the following values:
histogram is (array([1639657623, 0, 0, 356553924, 0, 0, 408125, 0, 0, 31], dtype=int64),
array([0., 0.3, 0.6, 0.9, 1.2, 1.5, 1.8, 2.1, 2.4, 2.7, 3. ]))

However, when I removed the (.convert('L')) from the function, it generated the below histogram but if failed in producing a valid figure using (imshow) function

histogram is (array([5989859109, 0, 0, 0, 0, 0, 0, 0, 0, 1996619703],
dtype=int64), array([ 0. , 25.5, 51. , 76.5, 102. , 127.5, 153. , 178.5, 204. , 229.5, 255. ]))

Answer 5 · 2021-04-13T07:55:36.000Z

Looks like the image having non-binary values is because of conversion errors. You can fix them by thresholding them. From the histogram you can see that the number of non-binary values is extremely small compared to the binary values.

imshow may not work because it's such a large image. Are you sure you have enough RAM to open the image? For me, I think it took around 20-30 GB of RAM.

Answer 6 · 2021-04-15T09:38:15.000Z

Thank you for the suggestion, I did the thresholding and I am now retraining the model.