ryanlayer/samplot

Question about plot interpretation; large putative inversion

Opened this issue · 3 comments

Hello --

Thank you for developing and maintaining a powerful resource for the genomics community.

I've genotyped SVs for a species of bird, and noticed there was an aberrantly large inversion that I suspected was a false positive. To check, I produced a samplot image for individuals of the three genotypes (0/0,0/1,1/1). The genotypes of each individual are indicated.

I have a few questions regarding interpretation. First, for inversions, is there a diagnostic for discriminating between the three genotypes (such as is done with differences in read coverage for deletions)? For example, would the number of discordant paired-end reads spanning the inversion be suggestive of different genotypic states?

Secondly, it appears the main signal of an inversion here is a single pair of discordant reads? Am I interpreting this correctly? If so, there appears to be a single pair of reads that is consistently mapping to the same locations on the reference genome, regardless of the called genotype? There also appears to be a second pair of discordant reads for COL_52524, but that doesn't seem to be in support of an inversion?

Any suggestions on how to best interpret these results would be greatly appreciated.

Thanks
Jack
4_14676614_57494906

The signal you're seeing here of blue discordant pairs indicating an inversion is a lot of pairs with about the same placement. In samplot there's no great way to differentiate these, but you can get an idea that there are several just because the blue is pretty dark. There are also faint dotted lines indicating chimeric alignments (in addition to the discordant pairs. So, not a single discordant pair, but it's not super easy to tell how many there are aside from "several". This is related to the question of genotype - genotyping inversions isn't super easy and samplot doesn't really try to do it. If you extract the split alignments and pairs that span this breakpoint you could come up with a count that might be useful for estimating genotype, but it's not as simple as the rules of thumb that work for copy number variation.

That makes sense, thanks. Good to know the signal appears to be strong, and perhaps indicative of a real inversion. Curiously, I ran a PCA of the SNPs located within it and didn't recover the expected signal -- individuals segregated by geography, not by zygosity. Will be interesting to dig into this further. Thanks again for your help.

Hello, yes, I am having the same challenge that multiple reads supporting an event aren't easily distinguished. I usually then inspect and count them in IGV, but I was wondering whether you have considered stacking them in some way so that they are not displayed on top of each other. Plotting them on top of each other not only their number is impossible to tell, but split reads can also be hidden.