Model tf_2dunet - Plan initialisation fails expecting /raid/datasets/MICCAI_BraTS_2019_Data_Training/HGG/0

Question

Model tf_2dunet - Plan initialisation fails expecting /raid/datasets/MICCAI_BraTS_2019_Data_Training/HGG/0

Closed this issue 21 days ago · 7 comments

Describe the bug
While trying the Quick Start Guide for model tf_2dunet, the plan initialisation step is failing.

Last few lines from the error message:

File "/home/azureuser/openfl/tests/openfl_e2e/my_workspace/src/tfbrats_inmemory.py", line 29, in __init__
    X_train, y_train, X_valid, y_valid = load_from_nifti(parent_dir=data_path,
  File "/home/azureuser/openfl/tests/openfl_e2e/my_workspace/src/brats_utils.py", line 94, in load_from_nifti
    subdirs = os.listdir(path)
FileNotFoundError: [Errno 2] No such file or directory: '/raid/datasets/MICCAI_BraTS_2019_Data_Training/HGG/0'

To Reproduce
Steps to reproduce the behavior:

Follow the steps mentioned in Quick Start replacing model torch_cnn_mnist with tf_2dunet
Create workspace, certify it.
Generate CSR request for aggregator with CA signing it.
Initialise the plan - fx plan initialize

At this step the error is thrown.

Expected behavior
There should be no error during plan initialisation.

Screenshots

Machine

Ubuntu 22.04

Additional

There is this README.md which mentions dataset structure for MICCAI_BraTS_2019_Data_Training.
But how to download it exactly? Is this mentioned anywhere?

For practice purpose, I found this link having dataset - https://www.kaggle.com/datasets/aryashah2k/brain-tumor-segmentation-brats-2019 but it contains too many subfolders as opposed to expected 0 and 1.

Answer 1 · 2024-11-08T15:16:52.000Z

fx plan initialize is currently taking the first entry from data.yaml. You either need to directly overwrite this to point at your dataset, or you can invoke the --input_shape flag if you know the expected data shape

To gain access to the data, originally you needed to send an access request to the MICCAI BraTS challenge, but that Kaggle link actually looks like the proper data. If so, the README.md includes steps to shard the data

Answer 2 · 2024-11-18T06:33:14.000Z

If one is able to run an experiment after the fix, should also consider closing #366 #398

Answer 3 · 2024-11-29T05:54:36.000Z

Hi @noopurintel,

I downloaded the dataset from Kaggle link that you have mentioned. https://www.kaggle.com/datasets/aryashah2k/brain-tumor-segmentation-brats-2019

After that I followed README.md.
I will list out the steps for you:

Download the dataset from https://www.kaggle.com/datasets/aryashah2k/brain-tumor-segmentation-brats-2019
Unzip the dataset unzip archive.zip -d /raid/datasets/
Use Tree command to check unziped dataset: /raid/datasets# tree $DATA_PATH -L 2

.
-- MICCAI_BraTS_2019_Data_Training
    |-- HGG
    |-- LGG
    |-- name_mapping.csv
    `-- survival_data.csv

3 directories, 2 files

cd MICCAI_BraTS_2019_Data_Training/HGG/
export SUBFOLDER=HGG
Run this code in terminal for 2 collaborators and change n as per number of collaborators as mentioned in the README.

for f in *; 
do 
    d=$(printf $((i%2)));  # change n to number of data slices (number of collaborators in federation)
    mkdir -p $d; 
    mv "$f" $d; 
    let i++; 
done

Check the result raid/datasets/MICCAI_BraTS_2019_Data_Training/HGG# tree -L 1

.
|-- 0
`-- 1

2 directories, 0 files

Follow Quick Start Guide

          INFO     Creating Initial Weights File    🠆 save/tf_2dunet_brats_init.pbuf                                                                plan.py:195
           INFO     FL-Plan hash is 196b877a93866735ca18687a2d1f94ad6dca8a3f0de541f84ca267ccc5fd63be00dd488102c0540c0b4efb434653b2c0                 plan.py:287
           INFO     ['plan_196b877a']                                                                                                                plan.py:222

 ✔️ OK

For the error mentioned below, I have fix in #1178.

File "<__array_function__ internals>", line 200, in concatenate
ValueError: need at least one array to concatenate

Answer 4 · 2024-11-30T10:36:39.000Z

@noopurintel can you confirm this and let us know, and we will close it accordingly.

Answer 5 · 2024-12-05T10:23:24.000Z

@rahulga1 @tanwarsh @kta-intel - We tried this. Below are our observations.

With 4 CPUs and 16 GB RAM - the initialization process is killed on its own after 2-3 minutes into the run.

With 16 CPUs and 64 GB RAM - it took 4 hours to complete 2 rounds of training.

Could you please suggest/document what minimal configuration is required to test this? Also, in general how much time it expects to take for one round of training. It will be helpful for the users.

Answer 6 · 2024-12-05T11:12:05.000Z

Hi @noopurintel,

I was able to complete the experiment with 10 rounds in 4-5 hrs with 16 CPU and 64 RAM.

Answer 7 · 2024-12-06T04:56:38.000Z

Hi @noopurintel, Closing this issue as the error is resolved.