FinnBehrendt/patched-Diffusion-Models-UAD

Couldn't reach paper results on BraTS21 dataset

Xikai97 opened this issue · 10 comments

Hi Behrendt,
Thanks so much for releasing the code for this project! It is a very impressive work for UAD. I tried to train the pDDPM model with default parameters in your repository. However, the average dice is only 0.416. Below is part of results for your reference.
Datamodules_eval.Brats21/test/AccuracyPerVolMean 0.9618777301139104
Datamodules_eval.Brats21/test/AccuracyPerVolStd 0.03505479496309627
Datamodules_eval.Brats21/test/AUCPerVolMean 0.9184630071711436
Datamodules_eval.Brats21/test/AUCPerVolStd 0.04592650580866205
Datamodules_eval.Brats21/test/AUPRCPerVolMean 0.4582559324147686
Datamodules_eval.Brats21/test/AUPRCPerVolStd 0.18912461549162585
Datamodules_eval.Brats21/test/BestDicePerVolMean 0.46190757024310775
Datamodules_eval.Brats21/test/BestDicePerVolStd 0.15122841488190034
Datamodules_eval.Brats21/test/BestThresholdPerVolMean 0.05139876317116432
Datamodules_eval.Brats21/test/BestThresholdPerVolStd 0
Datamodules_eval.Brats21/test/DicePerVolMean 0.4159915669422005
Datamodules_eval.Brats21/test/DicePerVolStd 0.16072703579074857
Datamodules_eval.Brats21/test/FNPerVolMean 6,251.2632493483925
Datamodules_eval.Brats21/test/FNPerVolStd 4,371.058174110206
Datamodules_eval.Brats21/test/FPPerVolMean 7,014.003475238923
Datamodules_eval.Brats21/test/FPPerVolStd 5,011.493203657907
Datamodules_eval.Brats21/test/FPRPerVolMean 0.5550672456773063
Datamodules_eval.Brats21/test/FPRPerVolStd 0.21507754248191388
Datamodules_eval.Brats21/test/HausPerVolMean 37.58951671198803
Datamodules_eval.Brats21/test/HausPerVolStd 7.7235294752802846
Datamodules_eval.Brats21/test/l1recoErrorAllMean 0.015985660830255335
Datamodules_eval.Brats21/test/l1recoErrorAllStd 0.005627935014539741
Datamodules_eval.Brats21/test/l1recoErrorHealthyMean 0.014097515680543193
Datamodules_eval.Brats21/test/l1recoErrorHealthyStd 0.005121284274648853

For data preprocessing steps, I notice in your paper, you mention that all preprocessed data are with a fixed resolution of [192x192x160]. However, after running the prepare_Brats21.sh script, the processed data sample with only a size of [136x177x139], and the imageDim parameter you set in MIDL23_DDPM/DDPM_patched.yaml is [192,192,100] which confused me. Did I process the BraTS dataset in a wrong way? Or I should change the parameters as same as in your paper, such as imageDim=[192,192,160]; patch_size=60; test_timesteps=400;

Thanks again for your effort in releasing this project. Looking forward to your reply!

Hi,
Thanks for reaching out!
After processing, each brain is cut to the brain boundaries. To ensure the same size for all volumes, we pad them to [192,192,160]. This is handled in the code; you should not need to change anything from the default config. By the way, the results in Table 1 of our paper are achieved with a patch size of 48 and 500 timesteps.
Did you run the experiments like that? Furthermore, were all pre-processing steps successful?

Hi Behrendt,

Thanks for your reply. Actually, above results I display corresponds to the default configuration in your code (test timesteps=500; patch size=48; imageDim=[192,192,100]). I strictly follow your pre-processing pipeline and use your script to handle original dataset. Unfortunately, the results still cannot reach above 0.45 of dice value in terms of the BraTS21 dataset. Could you share more info about you preprocessed data, like releasing one or two sample that you have preprocessed, so that I can compare my preprocessed data and verify the correctness of my pipeline.

Sure,
Here are some examples from the IXI data set.

IXI015-HH-1258_t2.nii.gz

IXI041-Guys-0706_t2.nii.gz

IXI037-Guys-0704_t2.nii.gz

If pre-processing works and there are no changes in the config, I do not see a reason for the worse performance. If you keep getting bad performances, I will try to clone the repo on a clean workstation to reproduce the behavior. However, unfortunately this may take some time

Hi @FinnBehrendt,
Congratulations on this amazing work.
Unfortunately, I'm having the same issues as @Xikai97. After running the code for pDDPM, with the default configurations, the results I got are a bit far away from the results reported in the paper (below are the results I obtained w.r.t. DICE (%)):

Name BraTS MSLUB IXI
PDDPM (ORIGINAL) 49.00 10.35 11.05
PDDPM (REPRODUCED) 38.66 8.71 11.38

Another question is related to the DDPM method in Table 1. The results can be reproduced with the flag experiment=MIDL23_DDPM/DDPM?

Thank you in advance!

Hi @CristianoPatricio ,
Thanks for bringing this up. Are there visible differences between the images I uploaded and the images you obtained from running process_ixi?
I am surprised that there are issues as pDDPMs have been implemented with comparable performance in other works.
Anyway, I will check this issue as soon as possible to ensure reproducible results.

Hi @FinnBehrendt ,
Thanks so much for your help! I compare the data you uploaded with my prepcessed data. It looks like there are some differences between the images. Below is the comparison of one certain slice between you provided sample and my processed data.
my_processed
Template

I also attach the whole processed volume data for your reference. I guess the problem might come from the pre-processing pipeline. I notice that in your preprocessing script, you take t1 image as template for registration, I don't know whether this is a typo or the original way is to use t1 template image to registered other t2w MRI images. Again, thank you very much for your reply!!!
IXI015-HH-1258_t2.nii.gz
IXI037-Guys-0704_t2.nii.gz
IXI041-Guys-0706_t2.nii.gz

Thanks for the comparison @Xikai97.
Indeed, the scans are different However, from looking at them its hard to tell the exact difference.
You are right. There is a typo in the preprocessing script, and using the t2 template would be the correct way. However, for the MIDL publication, I used the t1 images as a template so if you have not changed that, it should give similar results. However, Right now, I am running experiments to find the error/difference.

Hi @CristianoPatricio , Thanks for bringing this up. Are there visible differences between the images I uploaded and the images you obtained from running process_ixi? I am surprised that there are issues as pDDPMs have been implemented with comparable performance in other works. Anyway, I will check this issue as soon as possible to ensure reproducible results.

Just ran the following lines of code:

path_image_1 = "/home/cristianopatricio/Downloads/IXI015-HH-1258_t2.nii.gz"
path_image_2 = "/home/cristianopatricio/Desktop/PhD/Methods/patched-Diffusion-Models-UAD/Data/Train/ixi/t2/IXI015-HH-1258_t2.nii.gz"

vol_1, _ = sitk_reader(path_image_1)
vol_2, _ = sitk_reader(path_image_2)

print(f"[vol shape] Behrendt: {vol_1.shape}")
print(f"[vol shape] Patricio: {vol_2.shape}")

and the output was:

[vol shape] Behrendt: (140, 176, 140)
[vol shape] Patricio: (138, 173, 134)

There's a difference in the shape!

Just to be sure that I did the data collection right, I'll give below the steps of putting the data on the appropriate structure for preprocessing:

  • I downloaded the BraTS21 dataset from here: https://www.kaggle.com/datasets/dschettler8845/brats-2021-task1/data. Then, I moved all the BraTS2021_XXXXX_t2.nii.gz files to the t2/ folder, and all the BraTS2021_XXXXX_seg.nii.gz files to the seg/folder.
  • I downloaded the IXI dataset from here: https://brain-development.org/ixi-dataset/ (only T2 images). Then, I moved all the files to the t2/ folder.
  • Finally, I downloaded the MSLUB dataset from here: https://lit.fe.uni-lj.si/en/research/resources/3D-MR-MS/. Then I moved all patientXX_consensus_gt.nii.gz files to the seg/folder and all the patientXX_T2W.nii.gz to the t2/folder. However, there is an additional folder named raw/ in each patient folder, containing an patientXX_T2W.nii.gz file as well. Should I use the file from that folder? I used the one outside of the folder raw/.

Hi,
@CristianoPatricio This is the correct way. However, I think I got the BraTS 21 data from Synapse directly, but I am not sure as it was 2 years ago.
I spent some time investigating the issue.
When i clone this repo and use my original data that was used for the publication, i get the same results.
However, when i preprocess the image with the preprocessing scripts of this repo, i get different images compared to the images used for publication. This means that the code works, but the preprocessing is not reproducable at the moment.

However, i noticed, that there are dependency issues and the code does not run out of the box atm.
Could you tell me which versions of the software (antspyx, hd-bet,..) you used?

I will have to do some more investigation and eventually change the preprocessing scripts back to my original preprocessing.
In the meantime, i could provide you my original data (send me a mail if you are interested: finn.behrendt@tuhh.de)

Hi, @CristianoPatricio This is the correct way. However, I think I got the BraTS 21 data from Synapse directly, but I am not sure as it was 2 years ago. I spent some time investigating the issue. When i clone this repo and use my original data that was used for the publication, i get the same results. However, when i preprocess the image with the preprocessing scripts of this repo, i get different images compared to the images used for publication. This means that the code works, but the preprocessing is not reproducable at the moment.

However, i noticed, that there are dependency issues and the code does not run out of the box atm. Could you tell me which versions of the software (antspyx, hd-bet,..) you used?

I will have to do some more investigation and eventually change the preprocessing scripts back to my original preprocessing. In the meantime, i could provide you my original data (send me a mail if you are interested: finn.behrendt@tuhh.de)

Hi @FinnBehrendt,

Thank you for the time spent on this issue.
When I cloned the repo, I had some problems during the installation of the requirements, namely the antspyx and numba packages. Below are the versions of the packages you mentioned.

ants: 0.0.7                    
antspyx: 0.4.2                    
hd-bet: 1.0
numba: 0.56.4

Just sent you an e-mail (please check your mail box/spam).

Thank you.