MIC-DKFZ/nnDetection

[Question] Issues with preprocessing a RibFrac-like dataset

manellopez13 opened this issue · 7 comments

❓ Question

Hi, I'm preprocessing a dataset that should resemble RibFrac: it's also CT scans with semantic segmentation. The annotations are quite smaller than with RibFrac, though.

I get this error message, from what I get that there's some issue with the boxes, or anchors, the pipeline is using. But I'm not quite sure what may be the cause... I tried duplicating the dataset, to see if the number of boxes augmented, so that there are some for anchor planning, but I got the same error.

Could you please help ? Thanks !

Here's the last lines of the output, with the error message:

2024-04-13 12:30:28.181 | INFO     | nndet.ptmodule.retinaunet.base:from_config_plan:421 - Model Inference Summary: 
detections_per_img: 100 
score_thresh: 0 
topk_candidates: 10000 
remove_small_boxes: 0.01 
nms_thresh: 0.6
2024-04-13 12:30:28.284 | INFO     | nndet.planning.estimator:estimate:122 - Found available gpu memory: 11166810112 bytes / 10649.5 mb and estimating for 11511726080 bytes / 10978.4375
2024-04-13 12:30:28.317 | INFO     | nndet.planning.estimator:_estimate_mem_available:153 - Estimating in memory.
2024-04-13 12:30:28.318 | INFO     | nndet.planning.estimator:measure:192 - Estimating on cuda:0 with shape [1, 160, 112, 112] and batch size 4 and num_instances 4
2024-04-13 12:30:48.956 | INFO     | nndet.planning.estimator:measure:255 - Measured: 90.0 mb empty, 8686.0 mb fixed, 8686.0 mb dynamic
2024-04-13 12:30:49.108 | INFO     | nndet.planning.architecture.boxes.c002:_plan_architecture:223 - decoder levels: (2, 3, 4, 5); 
pooling strides: [[2, 2, 2], [2, 2, 2], [2, 2, 2], [2, 2, 2], [2, 1, 1]]; 
kernel sizes: [[3, 3, 3], [3, 3, 3], [3, 3, 3], [3, 3, 3], [3, 3, 3], [3, 3, 3]]; 
patch size: [160 112 112]; 

2024-04-13 12:30:49.110 | INFO     | nndet.planning.architecture.boxes.c002:_plan_anchors:258 - Filtered 4 boxes, 0 boxes remaining for anchor planning.
Traceback (most recent call last):
  File "/opt/conda/bin/nndet_prep", line 33, in <module>
    sys.exit(load_entry_point('nndet', 'console_scripts', 'nndet_prep')())
  File "/opt/code/nndet/nndet/utils/check.py", line 62, in wrapper
    return func(*args, **kwargs)
  File "/opt/code/nndet/scripts/preprocess.py", line 406, in main
    run(OmegaConf.to_container(cfg, resolve=True),
  File "/opt/code/nndet/scripts/preprocess.py", line 335, in run
    run_planning_and_process(
  File "/opt/code/nndet/scripts/preprocess.py", line 162, in run_planning_and_process
    plan_identifiers = planner.plan_experiment(
  File "/opt/code/nndet/nndet/planning/experiment/v001.py", line 43, in plan_experiment
    plan_3d = self.plan_base_stage(
  File "/opt/code/nndet/nndet/planning/experiment/base.py", line 234, in plan_base_stage
    architecture_plan = architecture_planner.plan(
  File "/opt/code/nndet/nndet/planning/architecture/boxes/c002.py", line 127, in plan
    res = super().plan(
  File "/opt/code/nndet/nndet/planning/architecture/boxes/base.py", line 352, in plan
    anchors = self._plan_anchors(
  File "/opt/code/nndet/nndet/planning/architecture/boxes/c002.py", line 270, in _plan_anchors
    params = self.find_anchors(boxes_torch, strides.astype(np.int32), anchor_generator)
  File "/opt/code/nndet/nndet/planning/architecture/boxes/base.py", line 449, in find_anchors
    maxs = sizes.max(dim=0)[0]
IndexError: max(): Expected reduction dim 0 to have non-zero size.
srun: error: gpu012: task 0: Exited with exit code 1

Hey, something seems to be off since nnDetection only reports 4 boxes in your dataset which crashed the code - Filtered 4 boxes, 0 boxes remaining for anchor planning.

Please also, nnDetection requires instance segmentations, not semantic segmentations. For semantic segmentation problems please refer to nnU-Net.

Apologies, I meant instance segmentation.

I think I found out what was going on, and now it works fine. This was not the first time I ran the nndet_prep script, so there were already some preprocessed files stored in the data directory. So I just removed all cropped and preprocessed folders, and rerun the script. And now the code does not crash, and outputs this message about the filtered boxes :

2024-04-16 13:03:06.219 | INFO     | nndet.planning.architecture.boxes.c002:_plan_anchors:258 - Filtered 20 boxes, 450 boxes remaining for anchor planning.

I guess I would've got the same result with the overwrite option set to True, right ?

Looks good now :)
Not sure what was going on, since nnDetection would load the properties file which includes all instances of your dataset irrespective if there is already cropped/preprocessed data present.

Not sure either... I just remembered that the first time I ran the script, the directories imagesTr and labelsTr were missing, and I only had imagesTs and labelsTs, since my plan is to just use the model trained on RibFrac on this other dataset. But to run any nndet script, these Tr directories are required, so of course the script failed. I made the directories and put just one CT scan there, just to fulfil the requirements... a CT scan with only 4 annotations. So maybe that's the origin of these 4 boxes.

Anyway, there's nothing wrong with your script, it's just my fault of running the code without the requirements. My apologies...

But before closing the issue, may I ask something else ?

Now, to use the RibFrac-trained Retina U-Net on my data, this is my plan :

  1. I will create the directory det_models/Task100_MyData/RetinaUNetV001_D3V001_3d/consolidated, and I will copy all files from det_models/Task020_RibFrac/RetinaUNetV001_D3V001_3d/consolidated
  2. I will run eval with nndet_eval 100 RetinaUNetV001_D3V001_3d -1 --test --boxes --analyze_boxes

Did I forget something ? I feel like this would be all. Maybe I'll have to tweak some parameters of the plan.pkl, though, to get the best performance on the new data, right ?

Thanks a lot, Michael

Ok, not quite sure what exactly crashed since for inference the imagesTr folders are not needed. Admittedly, the current way of running inference is a bit cumbersome and the new interface for V2 is already finished and will simplify this in the future :)

For now though:

  • You need to run nndet_predict after step 1 to predict the data. You probably need to use --force_args because the task names are not matching (this is currently checked :) ). After prediction, you can run eval which will only run the evaluation.
  • If your data is very close to RibFrac it might be okey to simply reuse the original hyperparamters :) Usually, the parameters are pretty reasonable anyway and performance will only slightly vary between a good and "perfect" postprocessing.

Oh... I see. It is clearly stated in the README that nndet_predict also preprocesses the data in imagesTs. My bad, I thought I had to run nndet_prep, this is why I created the Tr directories.

Cool ! I'm currently running nndet_predict. I chose nndet_predict 100 RetinaUNetV001_D3V001_3d --fold -1 --no_preprocess --force_args --check, where I choose not to preprocess as I already did with nndet_prep, and force_args is necessary, as you indicate. The first lines of the out file show the effect of this force_args :

2024-04-18 09:25:04.054 | WARNING  | scripts.predict:set_arg:129 - Found different values for task, will overwrite Task020_RibFrac with Task100_PMRibFrac
2024-04-18 09:25:04.054 | WARNING  | scripts.predict:set_arg:129 - Found different values for fold, will overwrite 0 with -1
Start data and label check: test=True
...
INFO Running inference
INFO Found 5 models to ensemble
...

I'll write again once it's finished.

Thanks for everything !

Oops, I forgot to write back !
All worked properly ! Thanks for the help Michael !