jackyko1991/vnet-tensorflow

How to handle multichannel inputs ?

Opened this issue · 9 comments

Hi,

I am looking into your tensorflow implementation of V-net and it's really interesting.
I have a question on the multichannel inputs because you say that your code handle them.

In the get_dataset function in the Niftidataset.py file, for each case you only append one image_path, i.e. you are only considering one image per case, as we can see in the code lines below.

for case in os.listdir(self.data_dir):
      image_paths.append(os.path.join(self.data_dir,case,self.image_filename))
      label_paths.append(os.path.join(self.data_dir,case,self.label_filename))

How do I adapt the code if I have 4 input images per case as in the tree below ?
.
├── ...
├── data
│ ├── testing
| | ├── case1
| | | ├── img_1.nii.gz
| | | ├── img_2.nii.gz
| | | ├── img_3.nii.gz
| | | ├── img_4.nii.gz
| | | └── label.nii.gz
| | ├── case2
| | | ├── img_1.nii.gz
| | | ├── img_2.nii.gz
| | | ├── img_3.nii.gz
| | | ├── img_4.nii.gz
| | | └── label.nii.gz
.
.
.

Thanks

Ok, what I did is that I concatenated the 4 3D nifiti files into 1 4D nifti file.
I'll try to run the code see if it works now.

@AchilleMascia sorry the network currently is only multiple channel ready but I hasn't write the input pipeline and tensorboard visualization for that.

There are two approaches to load the data, one use 4D nifti as you suggested, which is easier. The other way is to read data in a for loop, which you can do it here:

# read image and label
image = self.read_image(image_path.decode("utf-8"))

Please note that for the preprocessing transformation, currently is only for 3D purpose, you need to loop through the channels if necessary, though itk supports 4D images but not applicable for all.

You will also need to change number of channel for the the image placeholder in train.py
I think this should be fine except for tensorboard stuffs.

Thank you for your answer, I'll try reading data with a loop since I would like to preprocess the input.
I have got another issue when trying to run the code (with one input channel for the sake of experiment).
I have a GPU with 8GB of memory but it runs out of memory.

  1. Could you tell me which hyper-parameters I can play with in order to reduce the memory usage ?
    I am assuming the patch_size, patch_layer, epochs and the architecture of the V-net... Could you give a configuration of these parameters that would use less resources ?

  2. Also, could you explain why in your code the data_dir for the TestDataset is the train_data_dir just like for the TrainDataset ?

  3. Finally, which parameter tells us the number of transformation we perform on the training set during the data augmentation phase ?

Thank you very much for your help !

@AchilleMascia

  1. This question is hard to tell as you need adjust according to data itself. It depends on the label to background volume ratio and the resolution of medical image itself. For a 8GB GPU I guess 256 patch_size* 64 patch_layer will be possible. You can check GPU usage with nvidia-smi -l 1 on bash shell to get the instant GPU memory usage for the training. This need to be jointly change with NiftiDataset.Resample((0.45,0.45,0.45)), to get a optimal label-to-background ratio

  2. it is just a typo

  3. check this code:

trainTransforms = [
                NiftiDataset.StatisticalNormalization(2.5),
                # NiftiDataset.Normalization(),
                NiftiDataset.Resample((0.45,0.45,0.45)),
                NiftiDataset.Padding((FLAGS.patch_size, FLAGS.patch_size, FLAGS.patch_layer)),
                NiftiDataset.RandomCrop((FLAGS.patch_size, FLAGS.patch_size, FLAGS.patch_layer),FLAGS.drop_ratio,FLAGS.min_pixel),
                NiftiDataset.RandomNoise()
                ]

this will run once for each data

Thank you for your answers @jackyko1991 !
Could you tell me the how to change the parameters in NiftiDataset.Resample((0.45,0.45,0.45)) depending on the the label-to-background ratio ?
Actually I don't yet understand in what way the label-to-background ratio is important in the segmentation problem. I believe it's for the training phase to see how imbalanced the classes are but I would really appreciate if you could tell me what needs to be changed depending on this ratio.

Thanks a lot !

segmentation problem is a fundamentally a classification problem, up to pixel level.

for simplicity this vnet is optimized for binary segmentation, which means desired tissue pixels are labeled as class 1 and other pixels are labeled as class 0.

For 3D images the pixel count of 1 and 0 can be differ a lot. Imagine you need to find a tumor with diameter 1cm within your body, this can be 0.0000000001:1 ratio. That means labels are highly unbalanced and in most of the training time you will only train with 0 class only.

To reduce this problem I introduce some mechanisms like confidence crop or random crop with certian chance that must contain label volume > user preseted value. I am hard to say how to adjust this values to achieve good training result, as this differ from data to data.

What I can tell is to observe images in tensorboard, the ratio between white and black regions for label should be quite similar.

For small tissue segmentation, I may suggest RCNN style network or combine the vnet with a detection network for region proposal.

I am trying to use your model for ischemic stroke lesion segmentation.
The ratio of lesion vs total number of voxel is around 0.5%.

I have a few other questions on the model:

  1. when you say that the code below is ran once for each data

trainTransforms = [ NiftiDataset.StatisticalNormalization(2.5), # NiftiDataset.Normalization(), NiftiDataset.Resample((0.45,0.45,0.45)), NiftiDataset.Padding((FLAGS.patch_size, FLAGS.patch_size, FLAGS.patch_layer)), NiftiDataset.RandomCrop((FLAGS.patch_size, FLAGS.patch_size, FLAGS.patch_layer),FLAGS.drop_ratio,FLAGS.min_pixel), NiftiDataset.RandomNoise() ]

You mean that for each epoch of the training phase we do these on the data we have and then we keep 5 patches (trainDataset.shuffle(buffer_size=5) ) right ?

  1. I didn't quite understand when we use the evaluation folder. For me, I would train on the training folder and validate on the evaluation folder and once the model is trained we test it on the testing folder. In your case we only use the training and testing folder.
    Could you please explain the difference in our approaches ?
  1. The transformation will be applied in every step for every data. The batch shuffle creates a buffer pool for 5 images in multi-thread for cpu preprocessing. This will help to prepare batches in a more efficient way.

  2. the evaluation folder is for deployment purpose. In training stage due to memory issue only patches of images are used, to run through whole image you will need the evaluation.

Hi Jacky, I'm aware this is an old issue, but I want you to know that I'm working on some code to accept 4D nifti files and perform the preprocessing transforms on 3D slices. If you're still maintaining this repo, I'll submit a pull request once I have a working branch. Let me know.