BioPandas/biopandas

Stream support for exporting pdbs not working with OTHERS record

Closed this issue · 1 comments

Describe the bug

When trying to export pdb data with ATOM and OTHERS entries using .to_pdb_stream I always get a pandas.errors.IntCastingNaNError (cf. Steps/Code to Reproduce).
As I need to maintain the TER markers in the resulting pdb data, the content of the OTHERS frame is necessary.

When writing directly to a pdb file with .to_pdb there is no such issue. A possible approach in fixing could be an abstract base function for both methods or to specify the desired output (i.e. file or stream) in to_pdb as mentioned in #108

Steps/Code to Reproduce

Example:

from biopandas.pdb import PandasPdb

pdb_df = PandasPdb().fetch_pdb('1ou5')
out_string = pdb_df.to_pdb_stream(records=('ATOM', 'OTHERS'))

Expected Results

Stream containing the specified records in pdb format.

Actual Results

A pandas.errors.IntCastingNaNError stemming from Line 909 in pandas_pdb.py

df.residue_number = df.residue_number.astype(int)

which is executed on the entire concatenated DataFrame.
As the OTHERS frame doesn't contain residue number entries, these cells are always NaN after concatenating.

Versions

biopandas 0.5.0dev
Linux-5.4.0-91-generic-x86_64-with-glibc2.31
Python 3.10.12 | packaged by conda-forge | (main, Jun 23 2023, 22:40:32) [GCC 12.3.0]
Scikit-learn 1.3.0
NumPy 1.23.5
SciPy 1.11.1

Hi @gate-tec thanks for raising.

I think we should switch this to: pd.to_numeric(df.residue_number, errors='corce') and subsequently strip the NaNs. What do you think?