Support for circular genomic identifiers
mbiokyle29 opened this issue · 4 comments
mbiokyle29 commented
I am currently getting a parser error when using the o.
genomic DNA identifier (from https://varnomen.hgvs.org/bg-material/refseq/). The error can be reproduced with the following code:
from hgvs.easy import parse
# parse a variant from a plasmid using the circular genome
parse('plasmid:o.10A>G')
---------------------------------------------------------------------------
ParseError Traceback (most recent call last)
~/Dev/project/.venv/lib/python3.7/site-packages/hgvs/parser.py in rule_fxn(s)
124 try:
--> 125 return self._grammar(s).__getattr__(rule_name)()
126 except ometa.runtime.ParseError as exc:
~/Dev/project/.venv/lib/python3.7/site-packages/parsley.py in invokeRule(*args, **kwargs)
97 [["message", "expected EOF"]], err.trail)
---> 98 raise err
99 return invokeRule
~/Dev/project/.venv/lib/python3.7/site-packages/parsley.py in invokeRule(*args, **kwargs)
84 try:
---> 85 ret, err = self._grammar.apply(name, *args)
86 except ParseError as e:
~/Dev/project/.venv/lib/python3.7/site-packages/ometa/runtime.py in apply(self, ruleName, *args)
461 if r is not None:
--> 462 val, err = self._apply(r, ruleName, args)
463 return val, err
~/Dev/project/.venv/lib/python3.7/site-packages/ometa/runtime.py in _apply(self, rule, ruleName, args)
494 memoRec = self.input.setMemo(ruleName,
--> 495 [rule(), self.input])
496 except ParseError as e:
/pymeta_generated_code/pymeta_grammar__Grammar.py in rule_hgvs_variant(self)
37 return (_G_apply_12, self.currentError)
---> 38 _G_or_13, lastError = self._or([_G_or_1, _G_or_3, _G_or_5, _G_or_7, _G_or_9, _G_or_11])
39 self.considerError(lastError, 'hgvs_variant')
~/Dev/project/.venv/lib/python3.7/site-packages/ometa/runtime.py in _or(self, fns)
603 self.input = m
--> 604 raise joinErrors(errors)
605
ParseError:
plasmid:o.10A>G
^
Parse error at line 1, column 8: expected one of 'c', 'g', 'm', 'n', 'p', or 'r'. trail: [p_variant hgvs_variant]
During handling of the above exception, another exception occurred:
HGVSParseError Traceback (most recent call last)
~/Dev/project/.venv/lib/python3.7/site-packages/hgvs/shell.py in <module>
----> 1 parse('plasmid:o.10A>G')
~/Dev/project/.venv/lib/python3.7/site-packages/hgvs/parser.py in parse(self, v)
105
106 """
--> 107 return self.parse_hgvs_variant(v)
108
109 def _expose_rule_functions(self, expose_all_rules=False):
~/Dev/project/.venv/lib/python3.7/site-packages/hgvs/parser.py in rule_fxn(s)
126 except ometa.runtime.ParseError as exc:
127 raise HGVSParseError(
--> 128 "{s}: char {exc.position}: {reason}".format(s=s, exc=exc, reason=exc.formatReason())
129 )
130
HGVSParseError: plasmid:o.10A>G: char 8: expected one of 'c', 'g', 'm', 'n', 'p', or 'r'
Seems like o.
is not supported by the package. This is not a huge deal, but thought it was worth opening an issue. I'd be interested in providing a PR assuming I could get a little bit of guidance on the changes required. Seems like at a bare minimum I would need too:
- Update this regex: https://github.com/biocommons/hgvs/blob/main/src/hgvs/parser.py#L135
- Add expanded rules to https://github.com/biocommons/hgvs/blob/main/src/hgvs/_data/hgvs.pymeta for all the
o
cases
I am guessing there are other things that I am not aware of. Either way thanks for all the hard work on this package. It has been great to work with, the shell and .easy
module are both great!