biocommons/hgvs

g_to_c w/strict_bounds=False results in invalid ref

Closed this issue · 0 comments

reece commented
>>> import hgvs                                                                                                                                                                                            

>>> var_g = hp.parse('NC_000006.11:g.33415675_33422480del')                                                                                                                                                
>>> var_c = am37.g_to_c(var_g, "NM_006772.2")                                                                                                                                                              
...
HGVSInvalidIntervalError: Position is beyond the bounds of transcript record

>>> hgvs.global_config.mapping.strict_bounds = False                                                                                                                                                       
>>> var_c = am37.g_to_c(var_g, "NM_006772.2")                                                                                                                                                              
HGVSInvalidVariantError: NM_006772.2:c.3850_*2797del: Variant reference
(CTGCCAGAACCCAAGAAGAGGCTGCTCGACGCTCAGGTGGAA...)
does not agree with reference sequence
(CTGCCAGAACCCAAGAAGAGGCTGCTCGACGCTCAGGAGAGG...)

(edited to show sequence divergence near end of excerpt (and diverges beyond too)

This appears to be a post-projection validation that straddles the entire 3' UTR and falls off the end of the transcript.

At the very least, out of bounds variant should raise an error when strict_bounds=True (as it does), or not validate at all when strict_bounds=False.

Also: The imputed genomic sequence diverges early in the above error, before the end of the transcript. Why is that? (Is c.3850 not in last exon?)

@andreas-invitae