HKU-BAL/Clair

--haploid setting less sensitive than default

NatPRoach opened this issue · 3 comments

Hello,
I'm not sure if this is intended behavior, but the --haploid flag appears to be significantly less sensitive at detecting variants than the default setting. Looking at the code in call_var.py it appears to be because of this snippet:

        if output_config.is_haploid_mode_enabled:
            if (
                is_hetero_SNP or is_hetero_ACGT_Ins or is_hetero_InsIns or
                is_hetero_ACGT_Del or is_hetero_DelDel or is_insertion_and_deletion
            ):
                return (
                    (True, False, False, False, False, False, False, False, False, False),
                    (reference_base_ACGT, reference_base_ACGT)
                )

Which based on the later return statement in that function,

    return (
        (
            is_reference, is_homo_SNP, is_hetero_SNP,
            is_homo_insertion, is_hetero_ACGT_Ins, is_hetero_InsIns,
            is_homo_deletion, is_hetero_ACGT_Del, is_hetero_DelDel,
            is_insertion_and_deletion
        ),
        (reference_base, alternate_base)
    )

, seems to be returning that anytime there is a hetereozygous variant it defaults to reporting the reference variant when in --haploid mode.

My expectation was that --haploid mode would be more sensitive at detecting low frequency variants rather than defaulting to the reference more frequently. If this is intended behavior it may be worth clarifying what --haploid mode is doing behind the scenes and what assumptions it's making in the --help statement.
Thanks!

Thanks for the suggestion. Maybe I should provide two modes, --haploid_sensitive and --haploid_accurate.

Two new modes added. --haploid_precision will consider heterozygous alike positions as non-variant.--haploid_sensitive will consider heterozygous alike positions as variant.

Awesome, thanks for the quick turn around on this!