vkola-lab/brain2020

The problem of the model convergence

czp19940707 opened this issue · 6 comments

Hi Shangran Qiu
Thank you for providing code for your Brain 2020 paper, but I have the following questions.

  1. Random selection 474747 patches can ensure the distinction of features? If AD and NC subjects are very similar in some patches, but giving them different labels during training, which may lead model hard to convergence.
  2. In the inference stage, linear layer was transformed into conv layer directly by dense_to_conv(). However, the conv block is local connected structure, the linear layer is a full connection structure. Does it make sense to transform directly?
  3. MMSE features directly improved the classification performance of MLP model (according to the formula "DEMOR = [(DEMOR [0]-70.0)/10.0] + GENDER + [(DEMOR [2]-27)/2]", AD_MMSE<0, NC_MMSE>0), but MMSE features have strong prior knowledge, which cannot be obtained directly. Can this model be directly applied in clinical practice?

Hi, thanks for your questions!

  1. since the patches were randomly sampled from the whole volume according to uniform distribution for 3000 times, every region should be covered by the patches. Instead of assuming what regions are same or different between AD and NC, we let the model to figure out the region of interests automatically by giving the model all patches with equal probability. In the MCC heatmap from figure2, you will see what regions contribute to higher MCC value which indicates the most distinguishable anatomical structures to separate AD from NC.
  2. transforming from dense layer to conv layer is mathematically equivalent. You can do an experiment like below: send an MRI into the model before transforming the dense layer and record the output number; then send the same volume again to the model after transforming the dense layer to conv layer, you will notice the output exactly the same. To make sure the dropout doesn't introduce randomness, you need to turn the model into the inference mode.
  3. that formal is simply normalizing the age, MMSE features. age = (age-70) / 10, mmse = (mmse - 27) / 2. We also discussed the advantage and limitation of the clinically applying this method in the discussion section of the paper.

Hi, Thank you for your reply!
How to understand 'let the model to figure out the region of interests automatically by giving the model all patches with equal probability'? The model will find the most distinguishable anatomical structures and generate probability maps after training, but we have to train model frist. We still use patches to train the model, the label of patch is obtained by sMRI image file, if the center of patch locate at cerebellum(which AD and NC have the same anatomical structure), label_AD = 1, label_NC = 0, the training process will be ruined.

Hi, sure, let me clarify. First of all, we formulated the deep learning framework to predict subject's AD probability map ("disease probability map" is the term we used in our paper) using MRI as input. Our model is a FCN model which either take a patch as input to predict a scalar number which is the probability that the subject having AD, or can also take the whole MRI as input to predict a map of AD probability which can be considered as a saliency map. Because the FCN was applied on the whole MRI during the inference stage, we want to make sure that the FCN was trained comprehensively with patches sampled from everywhere. At the same time, we don't want to bias the FCN model towards any region/patch, that's why we sampled patches according to a uniform distribution. In other words, patch has equal constant probability of being sampled from anywhere. I hope the clarified the question you asked.

To address your second concern, I agree with you that some of the regions are more correlated with the AD label however some of the regions might be quite irrelevant to the AD label. You consider AD and NC subjects might have similar anatomical structures around cerebellum, but that is your prior knowledge on AD. In our work, we decided not using any prior knowledge regarding what regions are more relevant. Instead, we trained all patches using the subject level label (AD or NC). You mentioned cerebellum, and I can even give a more extreme example, patches on the background, which are identical for both AD and NC subjects. But the training process won't be ruined just because of this, instead, if you observe the risk map from our figure2, you will find the predicted AD probability at the background is around 0.5 which indicate mutual. At the same time, the MCC heatmap in figure2 indicates what regions are contributing to highly accurate predictions and what regions are not. We considered the regions with high MCC value as ROI for AD compared with NC.

I got it, Thank you very much!

Hi, I can obtain the same riskmap with you provided model (fcn_2760.pth), but when I train fcn model by myself data, training loss not decline, and the riskmap looks all while (the probability=0.5 every pixel).
My data selected from ADNI (MPRAGE), and preprocessed by step1-step4, patch random_sample to 474747, augment by Augment class, the superparams set according to config.json.

Would you please tell me how to train fcn model?
Thank you very much!

Hi, may I know how long you have trained your FCN model? According to what you described, if the predicted risk map values are all close to 0.5, I guess you didn't train it for enough epochs? Every epoch only sampled a single random patch from an individual. We trained the FCN model for 2000-3000 epochs. Thus, the number of epochs here is equivalent to how many random patches you sampled from the whole volume. Hope this is helpful!