HKU-BAL/Clair

Specific use case

Opened this issue · 1 comments

Hello,

I would like to know if Clair would be suitable for the following use case:

  • I have a haploid genome of 100 Mb on which I mapped illumina 250 pb reads. Then I did a SNP call with DeepVariant.
    Now I want to know what threshold of GQ to use to filter my variants. Note, that the genome is variant dense. For DeepVariant it mattered, and I had to change the training model otherwise it was not calling the SNPs.

It happens that now I have a phased diploid assembly of the genome, from pacbio HiFi.

My logic is the following:
map the phased diploid pacbio HiFi genome on the haploid reference, and check if the SNPs called with illumina correspond to a heterozygous site in the phased diploid HiFi assembly. That sounds simple, but it turns out that most tools are not used to check for SNPs with a coverage of 2x (of course, when mapping the HiFi assembly the coverage is max 2X), hence they return nothing.

Would Clair be able to do that? Basically calling a SNP with support from a single sequence out of 2?
Example

haploid genome: ATGGCGTA
snp call ATCGCGTA
hifi assemble: ATGGCGTAblablabla
ATCGCGTAblablabla

would Clair reports the G-C heteorzygous site?

Thanks a lot

Clair calls variants from raw read and requires a minimum depth of 4, so I don't think it would support your use case. According to your description, it seems WhatsHap's regenotyping function or the hapdip might help (but they are still read based).