hamidriasat/UNet-3-Plus

Question about dataset mask value

Closed this issue · 2 comments

Hi thank you for your nice work!

I have a question regarding the mask value of the dataset. The README mentions that each pixel of the mask represents a class. Should I understand this as values ranging from 1 to num_classes, or is there another way to define these values? I would appreciate a detailed explanation.

Thank you.

You're welcome!

In semantic segmentation datasets, each pixel in the mask image represents a class label. Let's consider a dataset, which focuses on two objects of interest: a person and a car. For these two objects, your mask will define three classes:

  • Background: Represented by 0, this class covers areas of the image that are not relevant to the segmentation task, such as sky, ground, or other parts not containing the person or the car.
  • Person: Represented by 1, this class includes pixels corresponding to the person in the image.
  • Car: Represented by 2, this class includes pixels corresponding to the car in the image.

Therefore, each pixel in the mask will be assigned a value of 0, 1, or 2, indicating whether it belongs to the background, person, or car class respectively. This labeling approach ensures that the model can accurately distinguish and segment different objects in the image during both training and inference stages.

If you need further clarification or have additional questions, feel free to ask!

Thank you for your kind reply.

I understand your explanation absolutely.

Thank you!