Is the image input during the training process the original image and the binary image?
lisongyu111 opened this issue · 17 comments
Yes, you got that right.
The img
and target
parameters to the full_forward
method used during training are exactly the original image (img
) and the binary target segmentation (target
).
Cheers, Konrad
Thanks for answering! Where are the original pictures and binary pictures in the code?
You can download the INRIA data here: https://project.inria.fr/aerialimagelabeling/
Here's the code that reads in the dataset: https://github.com/khdlr/HED-UNet/blob/master/deep_learning/utils/data.py
Thanks Reply! What pictures are put in the AerialImageDataset and scenes texts in the code?
The AerialImageDataset
is the only folder that is being used. scenes
is not used anymore, I will update the code and delete the get_batch
function. Thanks for catching my error!
Can you show me how you created the data file? I want to train my own data set.
The dataset was not created by me, I just wrote the code that loads the data at training time.
What you need to do to train on your own data, is to implement a torch.utils.data.Dataset
that returns pairs of image and ground truth annotations.
Then you can change the get_dataset
function in data_loading.py
to use your custom dataset instead of the pre-configured InriaDataset.
Let me know if that helps!
I have 256*256 original images and binary images of aerial cities here, but I don’t know how to train this net
Okay, so you'll subclass torch.utils.data.Datset
like this
class MyCustomDataset(torch.utils.data.Dataset):
def __init__(self):
# Do whatever initialization you need:
self.images = <list_of_image_paths>
self.targets = <list_of_binary_annotations>
def __getitem__(self, index):
# "load_image" can be imageio.imread for example
image = load_image(self.images[index])
target = load_image(self.targets[index])
return image, target
def __len__(self):
return len(self.images)
Then you can simply change data_loading.py
to use MyCustomDataset
instead of InriaDataset
.
Hope that helps.
Thank you
That is odd. I used fairly large pictures for training (768x768), but 256x256 should work.
Can you confirm that your masks are valid? What is the output if you add print(torch.unique(target))
to the full_forward
function in train.py
?
Not entirely sure what you mean by original / two-difference diagrams.
Another issue might be data scaling - what is the output when you add print('img', torch.min(img), torch.mean(img), torch.max(img))
in full_forward
?
As input, you have given the images and their segmented mask. I want to know what ground truth you have considered while modeling the edge detection part. It seems Sobel Kernel has been used for edge calculation (but in the paper, it has been written that your results are compared with those). Please highlight the Edge Detection part. Thank You.
At training time, we compute the edge GT by applying Sobel to the segmentation GT for cases where no edge GT is available. As these segmentation masks are valued in {0,1}, we can recover perfect edge masks.
The "Sobel" comparison in the paper is referring to applying Sobel directly to the imagery at test time, which will generate much worse edge masks.