Variation due to library size remains in normalized counts.

Question

Variation due to library size remains in normalized counts.

Closed this issue a year ago · 1 comments

Good evening,

I have a 10x Genomics CITE-Seq data set, stained with a cocktail of ≈ 130 antibodies. I tried applying the tool, since I like the rational behind it. There are several isotype controls in the cocktail and I ran it in the following configuration:

 DSBNormalizeProtein(cell_protein_matrix = counts(ab.sce.x), empty_drop_matrix = counts(sce.empty.x), 
                                    isotype.control.name.vec = isotype.ctrl, 
                                    denoise.counts = TRUE, use.isotype.control = TRUE,  define.pseudocount = FALSE, 
                                    quantile.clipping = FALSE, scale.factor = "standardize", return.stats = TRUE)

Unfortunately, I find variation due to library size remains in the final output. Exemplified by this UMAP representation of one sample (colour indicates ADT library size).

Many thanks,
M

Answer 1 · 2023-02-20T16:06:52.000Z

Hi @Thapeachydude thanks for your question.

That is by design. In contrast to several mRNA normalization methods, our goal is NOT to remove all the variation due to differences in total protein (protein "library size") between cells. The assumption that total surface protein counts should be the same between cells is not valid, even within the same cell subset. The total number of protein counts between cells may have a technical component but it also likely also reflects biological differences.

Step II of the method is to estimate and remove technical variation. Take a look at the section in the paper "Shared variance between isotype controls and background protein counts in single cells provide cell-intrinsic normalization factors" for more rationale. https://www.nature.com/articles/s41467-022-29356-8