Increase in library size differences with CSS normalization

Question

Increase in library size differences with CSS normalization

Opened this issue 6 years ago · 1 comments

For some of my projects, I find that CSS normalization decreases the variance in library size compared to raw sequencing depth. I am assuming this is one of the objectives of CSS normalization.

In other microbiome projects, I find that CSS normalization increases the variance in library size. Or in other words, dividing the the sum of OTU counts for the highest coverage sample by the sum of the OTU counts for the lowest coverage sample can be greater after CSS normalization compared to raw library sizes.

According to a study by Weiss et al. 2017 (Normalization and microbial differential abundance strategies depend upon data characteristics in Microbiome) having library sizes with large differences (>10X) rarefying lowers the false discovery rate.

For instance, in one microbiome study library size difference was 17X for raw reads, and increased to 53X with CSS-normalization. Would you expect that may occur, and is it ok to proceed with the CSS-normalization? Or is this an indication of some problem?

In all of my microbiome datasets (n = 4). p <- cumNormStatFast(MGS) has always returned 'using the default value'

Any comment would be greatly appreciated.

Answer 1 · 2019-02-04T17:14:59.000Z

Hi @hcorrada @jnpaulson, any insight on the question above. Thank you.