BioPandas/biopandas

Sort when Saving PDB

RMeli opened this issue · 3 comments

RMeli commented

When using to_pdb() so save a PandasPdb() object with modified data frames, the following warning is issued:

FutureWarning: Sorting because non-concatenation axis is not aligned. A future version of pandas will change to not sort by default.

  To accept the future behavior, pass 'sort=False'.

  To retain the current behavior and silence the warning, pass 'sort=True'.

    df = pd.concat(dfs)

Unfortunately, to_pdb() does not accept (or forward) a sort=False keyword argument. I need to write the new data frames unsorted, but this does not seem to be possible at the moment (despite what the warning is suggesting).

Conda environment:

Python: 3.6
BioPimportandas: 0.2.3
Pandas: 0.23.4
Numpy: 1.15.4
Scipy: 1.2.0
rasbt commented

Thanks for the note. As far as I can tell, there should be no difference as the sorting order is fixed (based on the PDB format) so that this is just a warning that doesn't have an effect.

I.e., the concatenation is followed by the lines

        if pd.__version__ < '0.17.0':
            warn("You are using an old pandas version (< 0.17)"
                 " that relies on the old sorting syntax."
                 " Please consider updating your pandas"
                 " installation to a more recent version.",
                 DeprecationWarning)
            df.sort(columns='line_idx', inplace=True)
        else:
            df.sort_values(by='line_idx', inplace=True)

such that the sorting in the concate function won't cause any difference.

I can update the package though to suppress the warning.

RMeli commented

Thanks for the fast reply and for the clarification. However, I actually need to save the data frames in the PandasPdb() with my own custom sorting. Would you be interested in a PR allowing the user to avoid sorting the data frames when a PDB is saved (with an additional sort=False aregument)? If yes, any pointers will be appreciated.

rasbt commented

Oh I see.

Regarding

Would you be interested in a PR allowing the user to avoid sorting the data frames when a PDB is saved (with an additional sort=False aregument)

Generally yes, but since it would not correspond to a regular PDB file then anymore, I wonder if sth like

ppdb.df['ATOM'].to_csv('file.txt', sep='\t', header=None)

or

pd.concat((ppdb.df['ATOM'], ppdb.df['HETATM'])).to_csv('file.txt', sep='\t', header=None)

would not already solve your problem?