g_to_c w/strict_bounds=False results in invalid ref
Closed this issue · 0 comments
reece commented
>>> import hgvs
>>> var_g = hp.parse('NC_000006.11:g.33415675_33422480del')
>>> var_c = am37.g_to_c(var_g, "NM_006772.2")
...
HGVSInvalidIntervalError: Position is beyond the bounds of transcript record
>>> hgvs.global_config.mapping.strict_bounds = False
>>> var_c = am37.g_to_c(var_g, "NM_006772.2")
HGVSInvalidVariantError: NM_006772.2:c.3850_*2797del: Variant reference
(CTGCCAGAACCCAAGAAGAGGCTGCTCGACGCTCAGGTGGAA...)
does not agree with reference sequence
(CTGCCAGAACCCAAGAAGAGGCTGCTCGACGCTCAGGAGAGG...)
(edited to show sequence divergence near end of excerpt (and diverges beyond too)
This appears to be a post-projection validation that straddles the entire 3' UTR and falls off the end of the transcript.
At the very least, out of bounds variant should raise an error when strict_bounds=True
(as it does), or not validate at all when strict_bounds=False
.
Also: The imputed genomic sequence diverges early in the above error, before the end of the transcript. Why is that? (Is c.3850 not in last exon?)