Expression data preprocessing

Question

Expression data preprocessing

mattiasaine opened this issue 3 years ago · 2 comments

Hi,

Would like to use MCPcounter on multiple data sets but have a question about your recommendations for preprocessing.

For e.g TCGA the GDC-portal allows download of gene-level counts, fpkm and fpkm-uq. Can e.g fpkm be run as-is in MCPcounter or is some processing recommended? Log/centering/etc?

Have some other (RNA-seq) sources as well that are not run using the GDC-pipeline (similar tough). Any generic recommendations for these?

Appreciate your input!

Answer 1 · 2021-09-30T09:29:14.000Z

Hi,
There is no universally good way to normalize data, although my personnal recommandation would be to use TPM then log2(1+TPM) before running MCP-counter. The same stands for all sources of data (TCGA or other). FPKM can be easily transformed into TPM: see https://bioinfo.umassmed.edu/content/pdf2016fall/normalization.pdf (with R code at the end for conversion).

Answer 2 · 2021-09-30T13:37:17.000Z

Good to hear, log2 1+tpm is what I commonly use. As you say no one right way out there but then at least I know how you would do it.

Thank you for the rapid feedback!