Input shapes at inference
Closed this issue · 7 comments
Hello :)
First of all, congratulations for your work. :)
I have a small question trying to understand how the input is preprocessed in the inference code. What I see is that you start by homogenizing the spacing to [1,1,1]. By doing so, my (576, 768, 232) input image becomes of shape (173, 230, 140). This is also going to be the final shape of the segmentation in output. Before feeding the image to the model, there is also some padding to get to the closest multiple of 32: the shape becomes (192, 256, 160). What I don't understand is how the model, which is trained at shape 160^3, still works when fed with the larger shape (192, 256, 160). Is there some cropping I don't see? Maybe to split in patches and then reconstruct at the final output?
Hi and thanks for the interest.
You got everything right for the preprocessing: we first resample to 1mm isotropic (ie the resolution at which the network has been trained) and then pad to a multiple of 32 (so that we can apply 5 MaxPooling which divide the size by 2 every time).
The training and testing shapes don't necessarily have to be the same. Once the network is trained, you can apply it to inputs of any shapes (that's the beauty of convolutional masks)! You can technically also train with inputs of different shapes, but it makes debugging more fastidious, so that's why I kept the training shape constant. So to answer your question, there's no hidden cropping or splitting in patches, we just process each image in one go.
Just note that the segmentations given by the netowrk are then "unpadded" to undo the padding to a shape divisible by 32.
Hope this helps
Benjamin
Thanks Benjamin, now it is very clear!
I have followed your tutorial and your suggestions, and I have started my own training. I have a 2 class segmentation problem, which requires to segment the brain and the vessels together. You can see below the results I get after few epochs.
The results on fake generated images (like the example) look pretty great, the brain and the vessels are correctly segmented. However, when I run the code on a real image, I get bad results for the brain (you can look at the second example).
Vessels are actually fine, what looks bad is the brain. My guess is that, since the generated images like example 1 look like skull-stripped images, the network is biased towards finding a homogeneous background. For this reason, what the network segments as brain in the real scenario is everything except the black background. But this is just my hypothesis. Since you actually made it work, because the checkpoint you provided can segment the brain with very good results, do you have any advice? Have you ever encountered a similar problem? Maybe the network just needs more training, as for now I have only trained for 5000 iterations x 9 epochs.
Thanks in advance,
Francesco
Hmm, so I don't know which training label maps you're using, but those should definitely include whole-brain label maps, otherwise the network doesn't learn how to properly deal with extra-cerebral tissues.
And yeah, you would need way way way more training than 9x5000 iterations. I think the correct way to go about this is to have a validation dataset of real images with corresponding GT segmentations, and then you can keep track of the scores along training (see the validate.py file).
Maybe it is not very clear from the first image provided as example, but I have labels for the whole brain, and for the vessels. However, I have no label splitting the extra-celebral tissue from the black background. Did you also have those when you trained your model?
I am already keeping track of the scores via validation, following the tutorial, thanks for the suggestion! I am definitely going to train for longer, I just want to be sure I am doing everything fine since the training takes long.
Sorry I meant whole-head label maps (not whole-brain). So yeah, you do need labels for extra-cerebral tissues. keep in mind that these labels must more or less follow the Gaussian intensity distribution. So having only one label that encompasses all extra-cerebral tissues is probably not a good idea (I don't know if it's your case, I'm just saying ;)). You need more labels to render the different tissue types (muscle, fat, bones, eyes, fluid, arteries, etc.), otherwise the network won't know how to deal with these on real test scans. Btw this also applies to the brain, you can't have only one "brain" label, the bare minimum is having labels for white matter/greay matter/ CSF.
I definitely put a lot of effort in designing good training label maps. See the journal paper for more details:
arxiv: https://arxiv.org/abs/2107.09559
journal paper: https://www.sciencedirect.com/science/article/pii/S1361841523000506
Thank you very much, I think I know what to do now. I had not fully understood that part on extra-cerebral tissues from the paper, now it is very clear from your last message. Thanks!
You're welcome !
Closing this for now, but let me know if you have more questions :)