HKU-BAL/Clair

How to handle such "duplicate" calls?

SHuang-Broad opened this issue · 3 comments

Hi,

I am using Clair on my raw ONT reads (human), and found the following two calls that I don't know how to handle. They essentially differ by 1 base. So I hope there's a "merge" module in Clair to post process such "duplicated" calls.

chr1	25808786	.	GGAGGTGAGGACAGCTGGGGTGCGACGTGGGGCCCCTCC	G	691	.	.	GT:GQ:DP:AF	0/1:691:31:0.2258
chr1	25808787	.	GAGGTGAGGACAGCTGGGGTGCGACGTGGGGCCCCTCCGC	G	582	.	.	GT:GQ:DP:AF	0/1:582:31:0.3548

Screen Shot 2020-02-09 at 5 16 59 PM

Unfortunately, I cannot share the data.

Thanks!

Will investigate and find a method to distinguish homoployer error induced duplication and real overlapped variants.

A quick workaround would be to retain only one copy (preferably the leftmost) of overlapping variants.

Thanks!

Yes, that's a quick workaround, although one can imagine some kind of threshold need to be defined for which two variants are to be considered "should be the same".
The case will be more complicated when insertions are involved, as one need to compare the inserted sequence as well.

We now added a simple way to deal with those variants, can check on https://github.com/HKU-BAL/Clair/blob/master/docs/POST_PROCESSING.md#handle-overlapping-variants