mskcc/RNAseqDB

normalized gtex data has truncated number of samples

Closed this issue · 2 comments

Hi,

i would like to ask if i could use expected_count data instead of normalized data? I noticed the normalized gtex data (fpkm): breast-rsem-fpkm-gtex.txt has only 89 samples. But the reported number of samples is 218 in the table (paper). May i know why?is it due to normalization that some samples are excluded or it is just being truncated accidentally?
see:
https://github.com/mskcc/RNAseqDB/tree/master/data/normalized/breast-rsem-fpkm-gtex.txt.gz

Could you help to enlighten?
thanks very much.

Regards
Herty

Thank you for your interest in our data. Yes, you can surely use expected_count instead of normalized data.

For your insightful question about GTEx data, there are 126 out of 218 GTEx samples are from male. Because TCGA breast tumors are from female only, we kept only female samples from the GTEx data so as to make it comparable with TCGA data.

I see, noted,thanks QiWang for your prompt reply. thank you.