exclude haplotypes if they don't fit within a `--region` perfectly
Closed this issue · 0 comments
Current behavior
Users can specify a --region
for the transform
or simphenotype
command. This will subset to just the haplotypes within a provided region.
But what if a haplotype doesn't exactly fit within the region? Currently, it still gets included, but its variants are truncated so that it contains only those variants that fit within the region.
haptools/haptools/data/haplotypes.py
Lines 871 to 878 in 05fcce1
This behavior is undesirable for a number of reasons. Not least b/c it's probably kinda unexpected.
Desired behavior
If the haplotype doesn't exactly fit within the region, we should just exclude it here:
haptools/haptools/data/haplotypes.py
Line 848 in 05fcce1
We might already do this, actually. It just depends on what the default behavior for pysam.TabixFile.fetch()
is. If it's similar to the tabix
command, then it should return anything that overlaps, regardless of whether it fits perfectly.