This is the official implementation of ICONIP 2023: Dynamic Data Augmentation via MCTS for Prostate MRI Segmentation.
To run the code base, first git clone the repo and install all requirements
pip install -r requirements.txt
Then navigate into DDAug/. and execute
pip install -e .
-
A Multi-site Dataset for Prostate MRI Segmentation (Subset 1-6)
- site A, B, C, D, E, F corresponds to subset 1, 2, 3, 4, 5, 6 respectively
-
nnUNet Prostate MRI dataset (Subset 7)
The folder structure follows the nnUNet folder structure. To ensure training runs without issue, you need environment variable nnUNet_raw_data_base
, nnUNet_preprocessed
, RESULTS_FOLDER
ready. Expected folder structure is shown below.
-
Create folder for raw data and assign path to environment varialbe
nnUNet_raw_data_base
, in which data are expected to follow:nnUNet_raw_data_base/ └── nnUNet_raw_data/ ├── Task001_Prostate_subset1/ │ ├── imagesTr/ │ │ ├── some_file_name_00_0000.nii.gz │ │ ├── some_file_name_01_0000.nii.gz │ │ ├── some_file_name_02_0000.nii.gz │ │ └── .... │ ├── imagesTs/ │ │ ├── some_file_name_03_0000.nii.gz │ │ ├── some_file_name_05_0000.nii.gz │ │ ├── some_file_name_11_0000.nii.gz │ │ └── ... │ ├── labelsTr/ │ │ ├── some_file_name_00.nii.gz │ │ ├── some_file_name_01.nii.gz │ │ ├── some_file_name_02.nii.gz │ │ └── .... │ └── dataset.json ├── Task002_Prostate_subset2/ │ ├── imagesTr/ │ │ └── ... │ ├── imagesTs/ │ │ └── ... │ ├── labelsTr/ │ │ └── ... │ └── dataset.json
Where the two digit
_00_
insome_file_name_00_0000
indicates scan number, and the four digit at the end indicates modality number. The essential content in dataset.json include:{ "name": "Prostate_RUNMC", "description": "", "reference": "", "licence": "", "release": "", "tensorImageSize": "4D", "modality": { "0": "MRI" }, "labels": { "0": "background", "1": "PZ", "2": "TZ" }, "numTraining": 30, "numTest": 0, "training": [ { "image": "./imagesTr/RUNMC_10.nii.gz", "label": "./labelsTr/RUNMC_10.nii.gz" }, { "image": "./imagesTr/RUNMC_06.nii.gz", "label": "./labelsTr/RUNMC_06.nii.gz" }, ... ], "test": [], "testing": [] }
please note due to limited data size and as described in the paper, we reported mean DICE of 5-fold cross validation on the validation set using the weights of the last epoch.
-
After formatting all the raw data, create folder for processed data and assign to environment variable
nnUNet_preprocessed
, then run the commandnnUNet_plan_and_preprocess -t Task001_Prostate_subset1
nnUNet will then create processed data in
nnUNet_preprocessed/ ├── Task001_Prostate_subset1/ │ └── .... ├── Task002_Prostate_subset2/ │ └── ....
-
Finally you need to create folder for training result and assign to environment variable
RESULTS_FOLDER
. Train logs, weights will be stored there.
To run 5-fold cross validation, make sure all environment variables are set, and execute
for fold in 0 1 2 3 4; do CUDA_VISIBLE_DEVICES=1 nnUNet_train 3d_fullres nnUNetTrainerV2_MCTS Task001_Prostate_subset1 $fold --npz; done;
Once training completes, with the same environment variables, execute
python nnunet/inference/summarize_val_folds.py
This will generate inference result and csv file with mean DICE score with weights using all 5-fold trainings. Please note the option --disable_tta
is set to True
in file nnunet/inference/summarize_val_folds.py
.
You can use below code in jupyter notebook to have a nice visualization of all the results. (the for-else loop is not a bug)
result_folder = "RESULTS_FOLDER/nnUNet/3d_fullres"
for each_task in sorted(os.listdir(result_folder)):
print("-" * 100)
task_dir = f"{result_folder}/{each_task}"
for each_model in sorted(os.listdir(task_dir)):
model_dir = f"{task_dir}/{each_model}"
table = pd.DataFrame()
normal_exit = False
for fold in range(5):
each_fold = f"fold_{fold}"
if not os.path.isfile(f"{model_dir}/{each_fold}/testing/result.csv"):
continue
fold_result = pd.read_csv(f"{model_dir}/{each_fold}/testing/result.csv", index_col=0).drop(
index=["mean", "std"]
)
table = pd.concat([table, fold_result])
else:
normal_exit = True
# normal exit
mean_all = pd.DataFrame(
data=[table.mean(axis=0).to_numpy()], columns=table.columns, index=["mean"]
)
std_all = pd.DataFrame(data=[table.std(axis=0).to_numpy()], columns=table.columns, index=["std"])
table = pd.concat([table, mean_all, std_all])
table.to_csv(f"{model_dir}/fold_summary.csv")
print(f'out csv [{table.shape}] -> {each_task} {each_model.split("__")[0]}')
print(mean_all.to_string())
if not normal_exit:
print(f'failed with -> {each_task} {each_model.split("__")[0]}')
print("\n-----\n")
You can use the draw_circle_using_mask.py
to generate below images. Modify line 48, 60 and 61 for filename and folder paths.