davyneven/SpatialEmbeddings

why don't use GT in instance seed loss calculation

Closed this issue · 5 comments

seed loss

            seed_loss += self.foreground_weight * torch.sum(
                torch.pow(seed_map[in_mask] - dist[in_mask].detach(), 2))

we usually use prediction and gt to calculate losses, but in your loss function, both seed_map and dist are prediction. So why don't use GT in instance seed loss calculation? It shouldn't be
seed_loss += self.foreground_weight * torch.sum(
torch.pow(gt[in_mask] - dist[in_mask].detach(), 2))
or
seed_loss += self.foreground_weight * torch.sum(
torch.pow(seed_map[in_mask] - gt[in_mask].detach(), 2)) ?

lxtGH commented

Hi! I have the same question on this @davyneven Could you kindly reply it ?

I understand the confusion. However, the seed-map is not a background-foreground map, but rather a (learned) indicator (comparable with aleatory uncertainty) for the error of the center regression of each pixel. So, the groundtruth should be the error of each pixel when regression to the center, which is "dist[in_mask].detach()". Note that the .detach() is important here, since we do not want to backpropagate to this value, but use it as groundtruth.
Intuitively, what will happen is that the borders around objects will get a low value, since these pixels are most probable to have a higher error regressing to the objects center. And those are the ones we do not want to select as a seed.

Thank you for the reply. Have you ever try this in laneline recognition just like you did this in "Towards End-to-End Lane Detection: an Instance Segmentation Approach"?
I have tried your instance segment in laneline recognition finding it's so hard to cluster the embedding points into one instance laneline.However, "Towards End-to-End Lane Detection: an Instance Segmentation Approach" does better in laneline cluster which has 4-dim embeddings. So 2-dim spatial embeddings maybe not suitable for laneline clustering?

Yes, learning offset vectors to an object's center work wonderful for objects with a well-defined center, like a car or a person (typically box-like objects). However, as you noticed, this is not the case for lanes, where the center-of-mass may vary a lot with different curvatures etc. One thing you could try though is to set 'to_center' in the spatial embedding loss to False. This way, the network can learn the center-of-attraction itself, which may be more stable for lane segmentation.

I also tired 'learnabel center of attraction', but it seems that it's hard to cluster the lanline in 2-dim.