ebecht/MCPcounter

Expression data preprocessing

mattiasaine opened this issue · 2 comments

Hi,

Would like to use MCPcounter on multiple data sets but have a question about your recommendations for preprocessing.

For e.g TCGA the GDC-portal allows download of gene-level counts, fpkm and fpkm-uq. Can e.g fpkm be run as-is in MCPcounter or is some processing recommended? Log/centering/etc?

Have some other (RNA-seq) sources as well that are not run using the GDC-pipeline (similar tough). Any generic recommendations for these?

Appreciate your input!

Hi,
There is no universally good way to normalize data, although my personnal recommandation would be to use TPM then log2(1+TPM) before running MCP-counter. The same stands for all sources of data (TCGA or other). FPKM can be easily transformed into TPM: see https://bioinfo.umassmed.edu/content/pdf2016fall/normalization.pdf (with R code at the end for conversion).

Good to hear, log2 1+tpm is what I commonly use. As you say no one right way out there but then at least I know how you would do it.

Thank you for the rapid feedback!