morrislab/pairtree

Math behind var_read_prob

Closed this issue · 3 comments

Hi Jeff,

So I assume it's like this: CF = VAF/var_read_prob. Correct?

If so, do I have to adjust for purity (p), i.e if my estimated var_read_prob is 0.5 then I have to multiply it by p (say 0.8) to arrive at true var_read_prob = 0.5*0.8 = 0.4 ??

Thanks
Chingiz

Hi Chingiz,

Thanks for your question. Hopefully I am able to provide a comprehensive enough answer using Jeff's thesis as my reference.

CF = VAF/var_read_prob is correct, however, adjusting for purity should not be necessary unless there is some copy number aberration at that locus. All the calculations for purity we'll be referring to some variant j in some sample s.

Let M be the average number of j variant containing alleles in a cell, and N be the population average copy number of the locus containing M. Using these definitions then var_read_prob can be defined as

var_read_prob = M/N

where

N = 2 + (K - 2) * p

with K being the copy number at that locus, and p being the fraction of cells which have a copy number of K at the locus containing M. p is the purity of the sample if the copy number abberation is clonal. In the case of a normal copy number K = 2, a single variant allele M = 1, and a correct VAF, then var_read_prob = 1/2 which should provide a good estimate of CF regardless of purity.

Please let me know if any of this is unclear.

  • Ethan

Great, thanks Ethan! Any chance I can access Jeff's thesis? Is it public?

Absolutely - a version of it can be accessed here: https://www.biorxiv.org/content/10.1101/2020.11.06.372219v2