ay-lab/dcHiC

Normalized counts or raw counts?

Nico-FR opened this issue · 2 comments

Hi,
I usually use normalized count (between 0 to 1) instead of raw (integer) counts for matrix processing. But for the loop analysis using FitHiC, we must used raw count with a bias files containing the normalization vector.

What do you advice for the compartment analysis with your tool, norm or raw counts ? I think we should use normalized matrices to take into account some biases.

Does the normalized bedgraph (using quantile) allow in a certain way to replace the normalization of matrices? I think it is only useful to compare between samples, right?

This is really a good question. We did an internal comparison (not published) between normalized and raw count compartments. We found that raw count reserves the biological features like laminB1 signal the most. So, we decided to go ahead with raw count compartment analysis. In our dcHiC Nat comm paper https://www.nature.com/articles/s41467-022-34626-6 we captured all the relevant biological features using raw counts. It should also be noted that while doing compartment calling we perform a distance normalization followed by correlation calculation on the raw counts and that probably takes out most of the biases from the data.

Quantile normalization is only to make sure there are no between-sample biases.

Thank you for the clear answer.