BioPandas/biopandas

read_mol2 fails on normal-looking file

Closed this issue · 2 comments

Describe the bug

Steps/Code to Reproduce

PandasMol2().read_mol2('file3_mod.mol2')

Expected Results

I expect biopandas to properly interpret the following mol2 file (uploaded with .txt extension for compatibility with markdown):

file3_mod.txt

Actual Results

See screenshots of the traceback:
Screenshot 2023-07-28 at 8 28 35 AM
Screenshot 2023-07-28 at 8 28 45 AM

Versions

biopandas 0.4.1
Linux-5.15.109+-x86_64-with-glibc2.35
Python 3.10.6 (main, May 29 2023, 11:10:38) [GCC 11.3.0]
Scikit-learn 1.3.0
NumPy 1.22.4
SciPy 1.11.1

a-r-j commented

Hi @RafiBrent sorry to hear you’re experiencing a bug. Unfortunately I don’t believe we have capacity as maintainers to resolve this right now. I’d hope either you or someone from the community could contribute a fix on this occasion.

I think I find the problem. There is an empty line between your ATOM and BOND blocks. In the code:

for idx, s in enumerate(mol2_lst):
    if s.startswith("@<TRIPOS>ATOM"):
        first_idx = idx + 1
        started = True
    elif started and s.startswith("@<TRIPOS>"):
        last_idx_plus1 = idx
        break

The empty line is also counted as an ATOM line. If you have a single file, just deleting it should work. If you have many files, you can either try my pdbx2df and use the read_mol2 function or wait for my PR for this repo.