khdlr/HED-UNet

Is the image input during the training process the original image and the binary image?

lisongyu111 opened this issue · 17 comments

Is the image input during the training process the original image and the binary image?
khdlr commented

Yes, you got that right.

The img and target parameters to the full_forward method used during training are exactly the original image (img) and the binary target segmentation (target).

Cheers, Konrad

Thanks for answering! Where are the original pictures and binary pictures in the code?

khdlr commented

You can download the INRIA data here: https://project.inria.fr/aerialimagelabeling/

Here's the code that reads in the dataset: https://github.com/khdlr/HED-UNet/blob/master/deep_learning/utils/data.py

Thanks Reply! What pictures are put in the AerialImageDataset and scenes texts in the code?

khdlr commented

The AerialImageDataset is the only folder that is being used. scenes is not used anymore, I will update the code and delete the get_batch function. Thanks for catching my error!

Can you show me how you created the data file? I want to train my own data set.

khdlr commented

The dataset was not created by me, I just wrote the code that loads the data at training time.

What you need to do to train on your own data, is to implement a torch.utils.data.Dataset that returns pairs of image and ground truth annotations.

Then you can change the get_dataset function in data_loading.py to use your custom dataset instead of the pre-configured InriaDataset.

Let me know if that helps!

I have 256*256 original images and binary images of aerial cities here, but I don’t know how to train this net

khdlr commented

Okay, so you'll subclass torch.utils.data.Datset like this

class MyCustomDataset(torch.utils.data.Dataset):
  def __init__(self):
    # Do whatever initialization you need:
    self.images  = <list_of_image_paths>
    self.targets = <list_of_binary_annotations>

  def __getitem__(self, index):
    # "load_image" can be imageio.imread for example
    image  = load_image(self.images[index])
    target = load_image(self.targets[index])
    return image, target
  
  def __len__(self):
    return len(self.images)

Then you can simply change data_loading.py to use MyCustomDataset instead of InriaDataset.

Hope that helps.

Thank you

khdlr commented

That is odd. I used fairly large pictures for training (768x768), but 256x256 should work.

Can you confirm that your masks are valid? What is the output if you add print(torch.unique(target)) to the full_forward function in train.py?

khdlr commented

Not entirely sure what you mean by original / two-difference diagrams.

Another issue might be data scaling - what is the output when you add print('img', torch.min(img), torch.mean(img), torch.max(img)) in full_forward?

As input, you have given the images and their segmented mask. I want to know what ground truth you have considered while modeling the edge detection part. It seems Sobel Kernel has been used for edge calculation (but in the paper, it has been written that your results are compared with those). Please highlight the Edge Detection part. Thank You.

khdlr commented

At training time, we compute the edge GT by applying Sobel to the segmentation GT for cases where no edge GT is available. As these segmentation masks are valued in {0,1}, we can recover perfect edge masks.

The "Sobel" comparison in the paper is referring to applying Sobel directly to the imagery at test time, which will generate much worse edge masks.