althonos/pronto

OBO parsing fails with a misleading error message

mukund109 opened this issue · 1 comments

When parsing cellosaurus.obo downloaded from here, I get this error.

image

The error is fixed when I manually remove

  1. lines starting with ! (lines 27-48)
  2. line with date information date: 05:20:2021 12:00 (line 3)

Hi @mukund109 , three things:

  • concerning the top error error message, this seems to be an IPython thing, because when I run it with the base Python shell I actually get the location of the syntax error in the OBO file:
    >>> import pronto
    >>> pronto.Ontology("cellosaurus.obo")
      File "<stdin>", line 3
        date: 05:20:2021 12:00^
    SyntaxError: expected NaiveMonth
    
    The above exception was the direct cause of the following exception:
    
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module> 
      File "/home/althonos/.local/lib/python3.9/site-packages/pronto/ontology.py", line 283, in __init__
        cls(self).parse_from(_handle)  # type: ignore
      File "/home/althonos/.local/lib/python3.9/site-packages/pronto/parsers/obo.py", line 18, in parse_from
        doc = fastobo.iter(handle, ordered=True)
    TypeError: expected path or binary file handle
  • concerning the error type (the fact that you get a TypeError where you would expect the SyntaxError to be raised directly), this is a bug in fastobo-py, I'm fixing it ATM.
  • concerning the cellosaurus.obo file, it cannot be parsed because it is not compatible with the OBO format version 1.4, it contains many syntax errors (that pronto cannot ignore by design). If you are in conttact with the ExPasy people, you can try reporting it to them directly, otherwise I'll send them an email to see what we can do about it. Most of the issues seem to be concentrated in the Xref IDs so that shouldn't be to hard to fix.