[Question] Combine boxes and segmentation mask?

❓ Question

nnUnet provides segmentation masks with classes, and then running connected components provides segmentation instances.
Is it possible to use nnDetection to provide the same information?

Looking at the nnDetection outputs, the boxes provide class and instance information, as well as model score and box coordinates.
Is it possible to process the boxes and segmentation mask generated via -o inference_kwargs.do_seg=True to relate the boxes to the information in the mask, i.e., identify the voxels that belong to a box?

It looks like the generated segmentation mask is binary, so it is not clear how to relate the boxes to segmented areas in the mask without running connected components on the mask.

Hi @beaslera,
Thank you for your question! The segmentation feature (inference_kwargs.do_seg=True) is still very experimental and has not been tested. The generated segmentation mask is indeed binary, and nnDetection does not provide built-in tools to link it to the boxes. You would need to create your own script by running connected components or other methods on the binary masks and then associating the resulting components with the bounding boxes based on spatial overlap or other criteria.

Best,
Partha

Thank you, Partha.

Since I need the semantic segmentation to be connected with the class and instance information, do you expect there to be any benefit to using nnDetection over nnUnet, or would you recommend nnUnet for this use case?
I've reviewed the results in the MICCAI 2021 paper, but those comparisons only looked at the boxes not the segmentation masks. Additionally, I would have to do post-processing off of an experimental segmentation mask.
As a follow-on to that question, if we filter down to the nnDetection boxes with a score >= 0.5, should they match the semantic segmentations?
Or alternately, could there be voxels with no matching boxes and/or boxes with no matching voxels?
Could you explain how to save the class for each voxel?
If I could do that, nnDetection would be providing the same type of mask data provided by nnUnet.
If I'm following the code well enough, the SegmentationEvaluator calculates the dice per class so the model has the class information per voxel, but it isn't clear where/how the saved data becomes binary.

Thank you again for your time.

Hi @beaslera,

There isn't a straightforward answer to this since the two frameworks are optimized for different objectives independently. nnDetection is optimized for object detection and excels at identifying objects that might be missed by nnUNet. On the other hand, nnUNet is designed for segmentation and typically provides more accurate segmentation for the objects it can identify.

If your primary need is instance segmentation, nnUNet with post-processing (e.g., connected components) to assign instance labels might be the simpler and more robust option. Also, the segmentation performance of nnDetection may be worse than that of nnUNet due to its lightweight decoder.

Setting a score threshold of 0.5 does not guarantee that the nnDetection boxes will perfectly match the semantic segmentations. There could be objects that were detected and segmented properly but some voxels were outside the bounding boxes. Also, there could be false-positive boxes predicted by nnDetection which may not correspond to any voxels in the segmentation.
There could be a few approaches:
- Using a class-specific segmentation head, but this approach can introduce training instabilities.
- Another approach could be assigning the bounding box class if a voxel falls within it (you can do a majority vote if there are overlapping bounding boxes). If the voxel does not fall within a bounding box you can apply KNN-like algorithms.

Hope it helped!

Best,
Partha

Yes, thank you Partha, that is all very helpful.

Would you mind explaining why the nnDetection decoder would be considered lightweight?

The nnDetection decoder is more lightweight than nnUNet because it uses the FPN architecture to aggregate features across multiple scales and to capture object-level information and so it may not be good at handling the finer details needed for semantic segmentation.

Best,
Partha

I think that is enough information for me to move forward, but I would like to understand better whether the highest resolution layer has class information for the segmentation mask.

Your comment above said I could use a class-specific segmentation head to get a segmentation mask with class information. Could I instead modify the code that saves the segmentation mask at inference?

The SegmentationEvaluator calculates the dice per class and the docstring of its run_online_evaluation says:

nnDetection/nndet/evaluator/seg.py

Lines 53 to 55 in b0504dc

    
                       seg_probs: output probabilities of network [N, C, dims], where N 
        
                           is the batch size, C is the number of classes, dims are 
        
                           spatial dimensions

Based on that, it seems that the segmentation output size for a batch is [N, C, dims]. This part in the SegmentationEnsembler seems to agree:

nnDetection/nndet/inference/ensembler/segmentation.py

Lines 267 to 269 in b0504dc

    
                       Dict: results 
        
                           `pred_seg`: [C, dims] if :param:`self.argmax` 
        
                               if False and [dims] if True

So I am having trouble understanding why the saved segmentation mask is binary and why a class-specific head would be the right approach. Can the highest-resolution layer not provide per-class probabilities? Or is the loss function for the segmentation mask not explicitly training the segmentation mask towards accurate classes (e.g., instead just foreground vs background)? Or is there some complication about how the segmentation mask gets saved to file that is forcing it to be a binary output?

Thank you very much for helping me understand.

Hi @beaslera,

In the current implementation, nnDetection by default uses a segmentation head that only distinguishes between the foreground and background. This is why the saved segmentation mask is binary rather than per-class. You can check the segmentation head used here:

nnDetection/nndet/ptmodule/retinaunet/v001.py

Line 38 in b0504dc

segmenter_cls = DiCESegmenterFgBg

To obtain segmentation masks with class information, you may look into the DiCESegmenter head and also probably you would need to modify the pipeline to make it work.

Best,
Partha

Great, thank you Partha! I appreciate you taking the time to answer my questions.

	seg_probs: output probabilities of network [N, C, dims], where N
	is the batch size, C is the number of classes, dims are
	spatial dimensions

	Dict: results
	`pred_seg`: [C, dims] if :param:`self.argmax`
	if False and [dims] if True