MIC-DKFZ/nnDetection

[Question]Visualize predicted bounding boxes

NataliaAlves13 opened this issue · 3 comments

❓ Question

I want to generate a nifti image from the prediction pkl files in order to visualize the predicted bounding box as an image overlay with the ground truth. How are the points stored in the pred_boxes variable? Is it [z_min, y_min, x_min, z_max, y_max, x_max]? That scheme doesn't fit my results (see example below):
pred_boxes = [ [ 74.778915 301.09387 81.22717 314.0541 267.7665 280.9015 ]
[ 68.93939 304.47952 77.68393 317.9851 217.12233 230.096 ]
[ 76.44196 303.1566 81.405975 312.67078 270.30823 279.60272 ]
[185.25406 174.5544 192.86487 187.48819 369.2219 382.15186 ]]
pred_scores = [0.9844008 0.9062509 0.6013834 0.7847769]
original_size_of_raw_data = [208 512 512]
itk_origin = (-249.51171875, -393.01171875, -1369.5)
itk_spacing = (0.9765629768371582, 0.9765629768371582, 3.0)
itk_direction = (1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0)

Dear @NataliaAlves13,

Handling the format can be somewhat complicated due to sitk. In the examples below, you can understand how the predictions maintain their relative orderings:

Nifty Image/sitk image Ordering: z, y, x
Array Ordering (post GetArrayFromImage): x, y, z
Predictions: x_min, y_min, x_max, y_max, z_min, z_max

Nifty ordering: x, y, z
Array ordering: z, y, x
Predictions: z_min, y_min, z_max, y_max, x_min, x_max

Notice how the indices in the predictions always follow the array format; the first four entries pertain to the first two axes of the array, and the last two entries relate to the final axis of the array.

Regarding the format:
The x_min, y_min, x_max, y_max format is widely used in object detection frameworks within the natural image computing field. We extended it in this manner to reuse code from the natural image computing domain. This allows us to easily incorporate the third dimension without rewriting entire functions.