kieranrcampbell/clonealign

How to quantify the clonal frequency consistency between DNA and CloneAlign?

Puriney opened this issue · 2 comments

Hi,
How do you think of a proper way to quantify the consistency between DNA and RNA data in terms of clonal frequency?

Without SIDR-seq / G&T-seq technology available, it is still possible to achieve G&T by performing single-cell DNA-seq and RNA-seq on two groups of cells but randomly selected from the same population of cell suspension. In theory, the scDNA-seq and scRNA-seq should reflect the same cell content, e.g., clones. Therefore, the clonal frequency estimated by DNA-seq and inferred by CloneAlign would be similar. This is also what is in your paper:

1152 single-cells post-QC (methods) were assigned to clones A, B, and C with prevalence of 80.6%, 13.8%, and 5.6%, closely matching the expected proportions inferred from the single-cell DNA-seq (82.3%, 10.8%, and 6.9%).

But my question is how to quantify the consistency? Your paper sort of eyeball the similarity. I was thinking of chisq.test, but I don't know if it makes sense.

m <- 1152

pa <- 80.6/100
pb <- 13.8/100
pc <- 5.6/100
stopifnot(sum(c(pa, pb, pc)) == 1)

ea <- 82.3/100
eb <- 10.8/100
ec <- 6.9/100
stopifnot(sum(c(ea, eb, ec)) == 1)

chisq.test(m * c(pa, pb, pc), p = c(ea, eb, ec), rescale.p=F)
# X-squared = 12.826, df = 2, p-value = 0.00164

The p-value suggested the clonal frequency of DNA and CloneAlign be significantly different, which was against the presumption. I don't mean to challenge the result, because in this specific case, I noticed the cells may not come from the same cell suspension; it violated the presumption.

We linked gene expression to clones in SA501 by generating single-cell RNA-seq from the SA501X2B xenograft passage using 10X genomics (methods) and assigned each cell to a clone (A, B or C) using clonealign.

In sum, my question is how to quantify the clonal frequency consistency instead of 'human-like' guess?

This is a really good question. Firstly let's address the chi-square test (with the huge pinch of salt that null-hypothesis significance testing is not my strong point). I think the issue currently is that it assumes the clone proportions are fixed exactly, when in fact these too are sampled from a DNA-seq population (of fewer cells in fact - 260 from here ). So if we modify your code to reflect this we get:

m <- 1152

pa <- 80.6/100
pb <- 13.8/100
pc <- 5.6/100
stopifnot(sum(c(pa, pb, pc)) == 1)

n <- 260
ea <- 82.3/100
eb <- 10.8/100
ec <- 6.9/100
stopifnot(sum(c(ea, eb, ec)) == 1)

mat <- cbind(m * c(pa, pb, pc), n * c(ea, eb, ec))

chisq.test(round(mat))
# X-squared = 2.1389, df = 2, p-value = 0.3432

That aside, some other comments on the significance testing approach:

  1. The cells would need to be "well mixed" before sending off for 10X + scDNA-seq. We know that's not the case here as they're from different xenograft passages as you pointed out
  2. We observe scRNA-seq dissociation effects on the cells (paper in prep), which may affect certain clones more than others
  3. clonealign is stochastic by nature and returns clone probabilities, which can change from run-to-run (slightly, hopefully)

I hope this helps, please let me know if you have more comments on the significance testing

Regarding the cell dissociation in Point-2, in my case, I was afraid that both scDNA-seq and scRNA-seq suffered the same problem if it happens. But very well explained about the biology side and the significance test. I will re-visit the chi-square test. Thanks again.