only use discordant reads
Yiguan opened this issue · 2 comments
Hi,
When a region having lots of discordant reads and a few split reads, I find the inferred breakpoints rely too much on the split reads.
For example,
3L 500033 2397 N <INV> 550.60 . SVTYPE=INV;SVLEN=1343;END=501376;STRANDS=++:38,--:30;IMPRECISE;CIPOS=-7,9;CIEND=-10,9;CIPOS95=-1,5;CIEND95=-3,1;SU=68;PE=67;SR=1
The region has 67 discordant reads and 1 split read. The inferred breakpoint 500033
is heavily relied on the one split read. But I tend to believe that the one split read is likely to be a mapping error or misalignment.
I am wondering if there is a way to remove the split read while only use the discordant reads to infer inversions. Or can I set a threshold for the minimum number of split reads? eg. when the number of split read > 5, then these split reads can be used.
Thanks,
The mapping quality of the read is 60, Cigar="49H41M".
Here is the IGV for the read (view as pair):
https://github.com/Yiguan/miscellaneous/blob/main/aa.png
In the screenshot, there are four sample tracks(three samples):
sample_95 discordant track
sample_95 split track
sample_88 discordant track
sample_62 discordant track
sample_95 3L 500033 2397 N <INV> . . SVTYPE=INV;STRANDS=++:38,--:30;SVLEN=1343;END=501376;CIPOS=-7,9;CIEND=-10,9;CIPOS95=-1,5;CIEND95=-3,1;IMPRECISE;SU=68;PE=67;SR=1 GT:SU:PE:SR ./.:68:67:1
sample_88 3L 500334 2221 N <INV> . . SVTYPE=INV;STRANDS=++:23,--:40;SVLEN=771;END=501105;CIPOS=-174,9;CIEND=-40,9;CIPOS95=-122,0;CIEND95=-30,0;IMPRECISE;SU=63;PE=63;SR=0 GT:SU:PE:SR ./.:63:63:0
sample_62 3L 500306 2541 N <INV> . . SVTYPE=INV;STRANDS=++:21,--:22;SVLEN=838;END=501144;CIPOS=-178,9;CIEND=-143,11;CIPOS95=-126,0;CIEND95=-128,2;IMPRECISE;SU=43;PE=43;SR=0 GT:SU:PE:SR ./.:43:43:0
The first breakpoint of the inversion in sample_95 was inferred in the "region1"(in the screenshot), while sample_88 and sample_62, the first breakpoint of the inversion were in "region2".
As the three samples are from the same family(siblings), I believe they should have the same inversion and "region1" is unlikely to be a true breakpoint.
Just confused why sample_95 having a breakpoint in "region1". I suspect that it may be caused by the split read(second track in the screenshot) as the read falls into the "region1".