Could you share more details about the settings of the dataset folder?

Question

Could you share more details about the settings of the dataset folder?

Closed this issue 5 years ago · 14 comments

Thanks a lot.

Answer 1 · 2020-06-10T03:33:48.000Z

Hi lambert,

We separate data from each site into 80% and 20% for training and testing.
You can refer to the implementation detail at the latest arxiv version: https://arxiv.org/abs/2002.03366

As we conduct pre-processing uniformly for the three datasets and rename original files, it is not straightforward to know which .nii file (according to official index) belongs to the training set and which one belong to the testing set.
You may first try random 4:1 split.

Thanks.

Answer 2 · 2020-06-10T05:34:46.000Z

Hi lambert,

We separate data from each site into 80% and 20% for training and testing.
You can refer to the implementation detail at the latest arxiv version: https://arxiv.org/abs/2002.03366

As we conduct pre-processing uniformly for the three datasets and rename original files, it is not straightforward to know which .nii file (according to official index) belongs to the training set and which one belong to the testing set.
You may first try random 4:1 split.

Thanks.

Thanks for the details. Does the "data_loader.py" include all the pre-processing steps you mentioned or it is omitted here? I am confused of the dataset folder hierarchy. Could you offer some information like the folllowing sketch. Thank you very much.
-codes
|
-data
|
---- train
| |
| ---- PatientID
| | |
| | ---- 00001.dcm
| |---- PatientID.nrrd
| |---- ....
---- test
| |
| ---- PatientID
| | |
| | ---- 00001.dcm
| |---- PatientID.nrrd
| |---- ....
---- validate
| |
| ---- PatientID
| | |
| | ---- 00001.dcm
| |---- PatientID.nrrd
| |---- ....

Answer 3 · 2020-06-10T05:43:26.000Z

In main.py, you can see how we load the samples of each site from list file like belows:

Each row in a certain list file includes the
absolute_path_of_image_nii, absolute_path_of_segmentation_nii
so that the data_loader.py can prepare the input from volume data online.

Answer 4 · 2020-06-10T06:03:35.000Z

Got it. My understanding is that you firstly transformed the origin datasets (e.g. NCI-ISBI 2013 in .dcm and .nrrd) to .nii format and create a list file to contain all the path information.
Is that correct?

Answer 5 · 2020-06-10T06:04:36.000Z

Yes, exactly.

Answer 6 · 2020-06-10T06:11:46.000Z

Thank you very much.

Answer 7 · 2020-06-10T22:08:06.000Z

Hi Quande,
Sorry to disturb you again. I have no idea which files I should download for the dataset of I2CVB. Should I download them with this link :https://zenodo.org/record/162231#.XuFYwUVKh3i? If so, how did you unzip the files in .aa .ab ....?
Thanks a lot.

Answer 8 · 2020-06-11T03:25:07.000Z

Hi lambert,

I remember that I also had some trouble when downloading and extracting this dataset, and I contact the author of this dataset for solution.

The procedure can be summarized as follows:

For the files with the suffix aa, ab, ac... are split from a single zip file and you need to concatenate them first to unzip them with the following command:
「cat bigfiledat.* > bigfile.zip」 for Mac/Linux
or
「copy /b bigfiledat.* bigfile.zip」 for Windows

For the files with the suffix bin.gz, you can need to use gunzip as follows:
「gunzip file.bin.gz」command for Mac/Linux
or
use 7zip for Windows

Answer 9 · 2020-06-11T05:01:12.000Z

I tried the commands for .aa, .ab.... files https://zenodo.org/record/162228#.XuGyoUVKh3i with Linux and Windows, and I can only get a large single file from the zip without any suffix.

For the other files with .bin and .gz ,https://zenodo.org/record/61163#.XuG3gUVKh3i, I tried with your command but failed.
Is there anything I missed? Or if possible, could you share me the processed data?
Thanks.

Answer 10 · 2020-06-11T08:34:51.000Z

I tried the commands for .aa, .ab.... files https://zenodo.org/record/162228#.XuGyoUVKh3i with Linux and Windows, and I can only get a large single file from the zip without any suffix.

Hi, what comment did you use? Maybe you can try cat mp-mri-prostate.* > bigfile.zip

Answer 11 · 2020-06-11T16:44:21.000Z

I tried the commands for .aa, .ab.... files https://zenodo.org/record/162228#.XuGyoUVKh3i with Linux and Windows, and I can only get a large single file from the zip without any suffix.

Hi, what comment did you use? Maybe you can try cat mp-mri-prostate.* > bigfile.zip

Exactly the same. After the command, I can only get a zip file which includes one file "bigfile" (without any suffix).

Answer 12 · 2020-06-12T03:17:15.000Z

Hi, can you unzip the file you got?

The procedure above is the same as what the authors of datasets told me to do.
And I can successufully merge and extract the dataset following those commends.

Answer 13 · 2020-06-12T03:36:08.000Z

Hi, can you unzip the file you got?

The procedure above is the same as what the authors of datasets told me to do.
And I can successufully merge and extract the dataset following those commends.

Hi Quande,
It finally works with mac. There might be something wrong with another PC with linux and didn't extract files correctly. Here I got 17 patients (19 in your paper?) with several modalities. May I know which modality you used for the experiments and what I should do for next step? Thanks.

Answer 14 · 2020-06-12T12:39:48.000Z

Hi lambert,

This dataset should contain 19 patients, please check wheter you have miss anything when downloading the dataset.

Modality of T2W was used in the experiment, you can refer to our paper for more implementation details.