Sort when Saving PDB
RMeli opened this issue · 3 comments
When using to_pdb()
so save a PandasPdb()
object with modified data frames, the following warning is issued:
FutureWarning: Sorting because non-concatenation axis is not aligned. A future version of pandas will change to not sort by default.
To accept the future behavior, pass 'sort=False'.
To retain the current behavior and silence the warning, pass 'sort=True'.
df = pd.concat(dfs)
Unfortunately, to_pdb()
does not accept (or forward) a sort=False
keyword argument. I need to write the new data frames unsorted, but this does not seem to be possible at the moment (despite what the warning is suggesting).
Conda environment:
Python: 3.6
BioPimportandas: 0.2.3
Pandas: 0.23.4
Numpy: 1.15.4
Scipy: 1.2.0
Thanks for the note. As far as I can tell, there should be no difference as the sorting order is fixed (based on the PDB format) so that this is just a warning that doesn't have an effect.
I.e., the concatenation is followed by the lines
if pd.__version__ < '0.17.0':
warn("You are using an old pandas version (< 0.17)"
" that relies on the old sorting syntax."
" Please consider updating your pandas"
" installation to a more recent version.",
DeprecationWarning)
df.sort(columns='line_idx', inplace=True)
else:
df.sort_values(by='line_idx', inplace=True)
such that the sorting in the concate function won't cause any difference.
I can update the package though to suppress the warning.
Thanks for the fast reply and for the clarification. However, I actually need to save the data frames in the PandasPdb()
with my own custom sorting. Would you be interested in a PR allowing the user to avoid sorting the data frames when a PDB is saved (with an additional sort=False
aregument)? If yes, any pointers will be appreciated.
Oh I see.
Regarding
Would you be interested in a PR allowing the user to avoid sorting the data frames when a PDB is saved (with an additional
sort=False
aregument)
Generally yes, but since it would not correspond to a regular PDB file then anymore, I wonder if sth like
ppdb.df['ATOM'].to_csv('file.txt', sep='\t', header=None)
or
pd.concat((ppdb.df['ATOM'], ppdb.df['HETATM'])).to_csv('file.txt', sep='\t', header=None)
would not already solve your problem?