mahmoodlab/PathomicFusion

Omic data normalization

hathawayxxh opened this issue · 1 comments

Hi, I want to ask a question about the data normalization. I have noticed that the RNAseq data provided in the pickle file (e.g., gbmlgg15cv_all_st_1_0_0_rnaseq.pkl) is different from the RNAseq data provided in the txt file (mRNA_Expression_z-Scores_RNA_Seq_RSEM.txt and mRNA_Expression_Zscores_RSEM.txt).
I guess maybe the data from the txt file is raw data downloaded from cBioportal, and the data from the pickle file is generated after some kind of normalization. I have utilized the raw data for experiments, but the performance is much worse than the pickle data, thus I think the data normalization must be very important. Could the author please tell me which kind of normalization have you utilized to preprocess the omic input data? Thanks a lot!

Hi @hathawayxxh

After downloading data from cBioPortal, we normalized the input using StandardScaler from scikit-learn. Hope this helps!