Could you share more details about the settings of the dataset folder?
Closed this issue · 14 comments
Thanks a lot.
Hi lambert,
We separate data from each site into 80% and 20% for training and testing.
You can refer to the implementation detail at the latest arxiv version: https://arxiv.org/abs/2002.03366
As we conduct pre-processing uniformly for the three datasets and rename original files, it is not straightforward to know which .nii file (according to official index) belongs to the training set and which one belong to the testing set.
You may first try random 4:1 split.
Thanks.
Hi lambert,
We separate data from each site into 80% and 20% for training and testing.
You can refer to the implementation detail at the latest arxiv version: https://arxiv.org/abs/2002.03366As we conduct pre-processing uniformly for the three datasets and rename original files, it is not straightforward to know which .nii file (according to official index) belongs to the training set and which one belong to the testing set.
You may first try random 4:1 split.Thanks.
Thanks for the details. Does the "data_loader.py" include all the pre-processing steps you mentioned or it is omitted here? I am confused of the dataset folder hierarchy. Could you offer some information like the folllowing sketch. Thank you very much.
-codes
|
-data
|
---- train
| |
| ---- PatientID
| | |
| | ---- 00001.dcm
| |---- PatientID.nrrd
| |---- ....
---- test
| |
| ---- PatientID
| | |
| | ---- 00001.dcm
| |---- PatientID.nrrd
| |---- ....
---- validate
| |
| ---- PatientID
| | |
| | ---- 00001.dcm
| |---- PatientID.nrrd
| |---- ....
Got it. My understanding is that you firstly transformed the origin datasets (e.g. NCI-ISBI 2013 in .dcm and .nrrd) to .nii format and create a list file to contain all the path information.
Is that correct?
Yes, exactly.
Thank you very much.
Hi Quande,
Sorry to disturb you again. I have no idea which files I should download for the dataset of I2CVB. Should I download them with this link :https://zenodo.org/record/162231#.XuFYwUVKh3i? If so, how did you unzip the files in .aa .ab ....?
Thanks a lot.
Hi lambert,
I remember that I also had some trouble when downloading and extracting this dataset, and I contact the author of this dataset for solution.
The procedure can be summarized as follows:
For the files with the suffix aa, ab, ac... are split from a single zip file and you need to concatenate them first to unzip them with the following command:
「cat bigfiledat.* > bigfile.zip」 for Mac/Linux
or
「copy /b bigfiledat.* bigfile.zip」 for Windows
For the files with the suffix bin.gz, you can need to use gunzip as follows:
「gunzip file.bin.gz」command for Mac/Linux
or
use 7zip for Windows
I tried the commands for .aa, .ab.... files https://zenodo.org/record/162228#.XuGyoUVKh3i with Linux and Windows, and I can only get a large single file from the zip without any suffix.
For the other files with .bin and .gz ,https://zenodo.org/record/61163#.XuG3gUVKh3i, I tried with your command but failed.
Is there anything I missed? Or if possible, could you share me the processed data?
Thanks.
I tried the commands for .aa, .ab.... files https://zenodo.org/record/162228#.XuGyoUVKh3i with Linux and Windows, and I can only get a large single file from the zip without any suffix.
Hi, what comment did you use? Maybe you can try cat mp-mri-prostate.* > bigfile.zip
I tried the commands for .aa, .ab.... files https://zenodo.org/record/162228#.XuGyoUVKh3i with Linux and Windows, and I can only get a large single file from the zip without any suffix.
Hi, what comment did you use? Maybe you can try
cat mp-mri-prostate.* > bigfile.zip
Exactly the same. After the command, I can only get a zip file which includes one file "bigfile" (without any suffix).
Hi, can you unzip the file you got?
The procedure above is the same as what the authors of datasets told me to do.
And I can successufully merge and extract the dataset following those commends.
Hi, can you unzip the file you got?
The procedure above is the same as what the authors of datasets told me to do.
And I can successufully merge and extract the dataset following those commends.
Hi Quande,
It finally works with mac. There might be something wrong with another PC with linux and didn't extract files correctly. Here I got 17 patients (19 in your paper?) with several modalities. May I know which modality you used for the experiments and what I should do for next step? Thanks.
Hi lambert,
This dataset should contain 19 patients, please check wheter you have miss anything when downloading the dataset.
Modality of T2W was used in the experiment, you can refer to our paper for more implementation details.