BioPandas/biopandas

Error handling when reading wrong file formats

dominiquesydow opened this issue · 3 comments

Describe the workflow you want to enable

Thanks again for your work on biopandas!

I have a small comment on the error handling when loading pdb files with read_mol2 (or mol2 files with read_pdb).

The current behavior looks like this:

mol2 module
from biopandas.mol2 import PandasMol2
pmol = PandasMol2()
pmol.read_mol2("xxxx.pdb")

Example output: UnboundLocalError: local variable 'first_idx' referenced before assignment (might look different depending on the input file and file format).

pdb module
from biopandas.pdb import PandasPdb
ppdb = PandasPdb()
ppdb.read_pdb("xxxx.mol2")

Example output: All data is loaded into the dict key "OTHER" (might look different depending on the input file and file format).

Describe your proposed solution

Would you consider adding a check for the correct input and throwing a descriptive error message?

I am using a ValueError at the moment but I am sure there are nicer ways to handle this:
https://github.com/volkamerlab/opencadd/blob/912d4e98e89edf38707249fd4f034cea136e1932/opencadd/io/dataframe.py#L202

This issue is not urgent at all.
It simply would make it easier / less verbose to use biopandas in other packages where we try to catch common user mistakes.

Thank you again for your time and work!

Describe alternatives you've considered, if relevant

None.

Additional context

None.

rasbt commented

Thanks for the feedback! I didn't think of user errors like that (yet) and like the suggestion returning a more descriptive ValueError, "No structural data could be loaded. Is the input text in mol2 format?". I'd appreciate a PR if you have time some time.

@rasbt, I tried out a few things as described in PR #73. If you have time, I am happy to hear your feedback (this issue and issue #70 are not urgent at all).

@rasbt I am closing this issue, since it was addressed in PR #73. Thanks for cutting a new release including the changes!