biocommons/hgvs

Modify parser to include common errors to provide better error messages than "syntax error"

Opened this issue · 3 comments

The current design uses Parsley to define the correct grammar only, which leads to a "syntax error" on some common mistakes.

If we modified the grammar parser to include common mistakes, then manually checked for them, we could give much more informative messages than "syntax error".

For instance, consider an insertion:

NC_000017.11:g.43091687_43091688insGATTACA

Here are some common mistakes I have seen in the wild:

Integer length for insertion

Example: NC_000017.11:g.43091687_43091688ins7
Suggested error: "Insertions require inserted sequence, rather than an integer length"

Missing insertion

Example: NC_000017.11:g.43091687_43091688ins
Suggested error: "Insertions require inserted sequence"

I am happy to do the work and make a pull request, just raising the issue for discussion

Potentually related to #367

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 7 days.