Dynast count error:
Closed this issue · 7 comments
Despite using Dynast Count on the same data there seems to be a difference when running Dynast with "TC, GA ", "GA", and "TC". It seems that there's in increase in the conversion you call for. (ie. Higher TC when you look for TC)
Hi, @MartinaVillanueva,
What exactly are you plotting here?
Are these the mutation rates?
Assuming those are what you are plotting here, it is likely due to how UMI deduplication works. When reads with the same cell BC and UMI that maps to the same gene is observed, the read with the most conversions of interest is selected.
Thanks! My understanding is that when Martin calls for TC,GA (with --conversion TC,GA
argument in dynast count), the TC, GA mutation rates are different from when you call for TC or GA separately (with --conversion TC
or --conversion GA
argument in dynast count). And when calling for TC or GA, the corresponding TC/GA mutation rate is higher than when you don't look for it. Is any special treatment for the mutation you asked for (via ---conversion
) comparing the rest mutations?
Exactly @Xiaojieqiu! Does that make sense @Lioscro ?
Take a look at GA conversion and how it it lower when we don't look for the conversion (last slide) vs when we do look for it (the top 2 slides)
I see what you mean. This is because in the UMI deduplication step, which read is selected depends on the number of conversions (see my previous comment). When you supply --conversion TC,GA
, the read with the most TC+GA conversions is selected; when you supply --conversion GA
, the read with the most GA conversions is selected; and vice-versa when you supply --conversion TC
.
(To be exact, the order of priority is 1) the read that maps to the transcriptome (exon only), 2) the read that has the highest alignment score, 3) read with the highest sum of the provided --conversion
.)
Does that make sense? So it seems that you have many reads per UMI that map to the same gene, do not map to exons only, have (equal) maximum alignment score, but have quite different conversion numbers.
I see. And so the reason we see changes in other conversions (see blue and yellow circles) is because based on the transcripts that were selected to have the conversion of interest, it changes the background. Is that right?
Would you expect this to affect the accuracy of calling new / old transcripts?
210809_dual_labeling_2.pdf
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days