Direction transform (DN)
qAp opened this issue · 20 comments
Deep watershed transform paper: https://arxiv.org/pdf/1611.08303.pdf
Authors' implementation: https://github.com/min2209/dwt/tree/master/DN
The distance transform can be computed with one of:
cv2.distanceTransform
scipy.ndimage.distance_transform_edt
skimage.morphology.medial_axis
These are timed respectively 2 ms, 129 ms, and 50 ms. 2 and 3 return identical results, while 1 returns more finely resolved distances.
How to deal with situations where the annotations of adjacent cells overlap each other a bit?
The submission requires that the predicted instances don't overlap.
In a natural image, if car A is in front of car B in the line of sight, and their annotated masks overlap slightly, it obviously makes sense to have car A's mask over car B's mask.
But in an image of cells, it's not obvious which cells are on top of which ones, and the overlap might just be due to annotation uncertainty and in fact the cells don't overlap at all. Earlier, this overlap is simply ignored, set as background. However, this means that both cells lose out. It's better to have one cell over the other.
The order with which the cells appear in train.csv naturally provides a way to decide which cells should be on top of others: the cells that appear later will cover others. But a better way might be to have the smaller cell on top of the larger one, because the overlap area is a larger proportion of the smaller cell than of the larger cell, so it's more important for the smaller cell to keep that area.
- Generate training targets by building cell-by-cell.
- Check model architecture. Restrict output values to between -1 and 1.
Actually, it's incorrect to build up training target by computing uvec for each cell and pasting them onto the image frame, because say cell A is pasted in before cell B, but cell B overlaps cell A. This moves the border of cell A slightly and yet its uvec doesn't change.
It's better to have all the borders in place over the whole image, before computing dtfm (and then uvec) for the whole image in a one go. This is also more efficient computationally.
Semantic segmentation with cell borders could be constructed as follows:
- Sort cells by their area in descending order.
- Take a cell, compute its semantic segmentation (semg).
- Dilate this cell's semg.
- Using the dilated semg and the original semg, get the cell's 'border'.
- Paste into the image frame the cell's dilated semg.
- Then remove the cell's border from the image frame.
- Repeat 2 to 6 for all cells in the image.
This way, in the image frame, you should end up with a semantic segmentation, where 1 denotes cell, 0 denotes either background or border between cells that are just touching or overlapping.
Note that this procedure, despite the use of dilation, largely keeps the cells in their original size, because of the removal of the border afterwards. However, when the cell that's being pasted in overlaps with existing cells, these cells do lose a few pixels when the border is removed.
Then, the distance transform can be computed from this semantic segmentation with borders, and then uvec and discrete watershed energy.
The dilation of a cell's semg is done with skimage.morphology.dilation
, with selem=square(width)
. It's found by manual inspection of several samples that width=3
is the best value to use. Anything smaller will mean that some overlap borders are not marked out, any larger will mean more actual cell area being removed. It's noted that kaggler selem uses 5 in his solution for the 2018 Data Science Bowl.
The semi-supervised images cannot be made use of actually, because ground truth instance segmentation is needed to generate training targets for the DN and WTN. It cannot be generated from semantic segmentation.
- Use transposed conv2d to increase resolution.
- Gate inputs and activations with
ss
orssMask
appropriately. - Initialise parameters.
- Create dataset and datamodule for
DirectionNetMock
- Check loss function.
- Compute each pixel's instance area using the latest semantic segmentation mask, to ensure none are 0. This might have caused division by zero in direction loss.
- Make and visualise the predictions for a few validation samples.
Model output appears reasonable:
https://github.com/qAp/sartorius_cell_instance_segmentation_kaggle/blob/main/images/DN_sample_inference.png
Model used: fine-resonance-9
semseg[..., [0]]
---- Cell mask with overlap walls removed.
semseg[..., [1]]
---- Overlap walls.
Actually that was too easy, because the semantic segmentation fed into the model has the overlap borders marked out (i.e. semseg[..., [0]]
is used.). Really, the correct semantic segmentation for the model input should be semseg[..., [0]] + semseg[..., [1]]
.
Indeed, current trained models (trained with semseg[..., [0]]
) struggle during inference when semseg[.., [0]] + semseg[..., [1]]
is used.
Could try:
- Re-train DN using
semseg[..., [0]] + semseg[..., [1]]
in place ofsemg[..., [0]]
. - See if the resulting model does better during inference when input with
semseg[..., [0]] + semseg[..., [1]]
.
DN performing slightly better than before, after training using the correct input semantic segmentation, semseg[..., [0]] + semseg[..., [1]]
. It might, however, benefit from more training.
- Re-train DN using the same target uvec, but with the input
semseg
computed by theUnet
.
If the semantic segmentation model's performance is poor, it doesn't make sense to use its predictions in the training of the DN, because the semantic mask is used to mask the image input to the DN. Suppose there are some cells which are in the ground truth semantic mask but not in the model-predicted semantic mask. A training sample based on these masks would like asking the DN to predict non-zero uvec over regions where there is zero image input. This is an unreasonable expectation, and is like training with poorly labelled data.
Therefore, it's probably still best to stick with training DN using solely ground truth semantic masks, in both the input and output of the DN.
Given the above reasoning, there does not seem to be any major changes that need to be done to the training of DN, so just loading the current best model and training that for more epochs to see if the performance can be further improved.
Further training has been carried out according to above. The learning rate scheduling is the same except the one-cycle maximum learning is set to a lower value of 1e-4. The validation loss continued to decrease past the previous lowest value.
- The uvec computed for an edge cell is different when the image mask is zero-padded. Investigate why and how padding should be used, especially during inference.
When uvec is computed using dtfm_to_uvec
, the image is effectively padded with symmetric mode first, then the gradient is computed, and then the image is cropped back to the original size.
This means that for an edge cell, the part of its edge that lies on the image border is not taken as the actual wall of the edge cell. It's implicitly assumed that the actual cell extends beyond the image border, reflected symmetrically. It's this 'extended' cell for which the gradient is computed.
The implication of this for inference is that, because the original image size always needs to be padded to something divisible by 32, padding is required. But if padded with 0, then the edge of an edge cell on the image border becomes part of the actual cell wall. The gradient computed on this will be different from if the actual cell is assumed to extend beyond the image border, which is implicitly assumed during training.
Therefore, for inference, when padding the original size, the padding mode should be symmetric.
The training target for DN is the uvec map, which for the background is defined to be 0. This could be why the background is masked out at model input, output and the loss. The model is forced to only look at the image where it's cell and predict the uvec in those regions, not anywhere else, because there is no target uvec for the background.
What if the background's uvec is taken into account?
In the paper, where the authors' look at street scenes, is the reason background is ignored that there aren't many background pixels? In these cells images, the background forms a significant proportion of the image, so this might make it important to take the background into account.
What if the background itself is treated like a gigantic cell? The target uvec for it can be computed. The model can be made to look at the entire image now, instead of only the cell regions, according to the semantic mask.
Obviously, as a cell, the background is a large region, so its pixels' instance area would be large, meaning the weight for the loss would be much smaller than the cells'.