mahmoodlab/PathomicFusion

Regard reproducing the GBMLGG survival prediction

cuicathy opened this issue · 2 comments

Hello,

Thanks for sharing the code and data. I have three questions related to the survival prediction with GBMLGG dataset.

  1. I used the released models and data splits to make survival prediction and I got average cindex_test 0.8078 for Pathomic Fusion model and 0.7104 for the trained GCN model, which are lower than the value reported in the paper (0.826 and 0.746). Do you perhaps know why?

These are the environment and command lines I used to reproduce the result:

torch==1.9.0
torch-cluster==1.5.9
torch-geometric==1.3.0
torch-scatter==2.0.7
torch-sparse==0.6.10

PathomicFusion: test_cv.py --exp_name surv_15_rnaseq --task surv --mode pathgraphomic --model_name pathgraphomic_fusion --niter 10 --niter_decay 20 --lr 0.0001 --beta1 0.5 --fusion_type pofusion_A --mmhid 64 --use_bilinear 1 --use_vgg_features 1 --gpu_ids 0 --omic_gate 0 --grph_scale 2 --use_rnaseq 1 --input_size_omic 320 --reg_type none

GCN: test_cv.py --exp_name surv_15_rnaseq_dbcheck --task surv --mode graph --model_name graph --niter 0 --niter_decay 50 --lr 0.002 --init_type max --reg_type none --lambda_reg 0 -use_vgg_features 1 --gpu_ids 0

  1. According to the #4, when doing the evaluation, mean accuracy of the 9 patches is used for each sample (1024 x 1024). Maybe I missed something, but I did not find where is the code to average the 9 patches.

  2. When I ran the command line: --exp_name surv_15_rnaseq --task surv --mode path --model_name path --niter 0 --niter_decay 50 --batch_size 8 --lr 0.0005 --reg_type none --lambda_reg 0 --gpu_ids 0 --use_vgg_features 1 to test the trained CNN model, I got the following error: RuntimeError: Expected 4-dimensional input for 4-dimensional weight [64, 3, 3, 3], but got 2-dimensional input of size [8, 32] instead. I think there is a conflict between the input features (batchsize * 32) and the model (require 512 x 512 patches as inputs), so I should change either code or input features to let it run, right?

Thanks for your help!

Hi @cuicathy - thank you for your interest in our repository, and raising this issue.

Regarding the first + second points, the discrepancies in c-Index performance can be attributed how we computed overall c-Index using predicted risks per patient (not per data sample). To get predicted risk per patient, the mean accuracy for the 9 patches is computed for each sample, as you suggested. To better reproduce exact numbers, we recently refactored our utils.py script, with the analysis code separated (and expanded) into its own section. Please also see this Jupyter Notebook for evaluation on GBMLGG and KIRC, with the exact function used for computing c-Index here..

Regarding the third point, the evaluation should be performed and ran on images for path-only evaluation, so the 512 x 512 patches should be the inputs (using the PathgraphomicDatasetLoader).

Apologies for the confusion, as evaluation (with many combinations for multimodal fusion for different tasks) is complex.

Thanks for your answers. They are helpful!