mcocdawc/chemcoord

Pandas 1.5 warning 2.0 failure

ghutchis opened this issue · 4 comments

Problem description

When using the code:

/Users/ghutchis/Devel/chemcoord/src/chemcoord/_generic_classes/generic_core.py:51: FutureWarning: Passing a set as an indexer is deprecated and will raise in a future version. Use a list instead.
  new_frame = data.loc[atoms, set(new_cols) - set(self.columns)]
…
/Users/ghutchis/Devel/chemcoord/src/chemcoord/cartesian_coordinates/_indexers.py:15: FutureWarning: Passing a set as an indexer is deprecated and will raise in a future version. Use a list instead.
  selected = self.molecule._frame.loc[key]

This is with pandas-1.5. In my environment with pandas-2.0.x, the code fails because passing a set as an indexer is no longer supported.

Expected Output

No warnings or errors with pandas

Output of cc.show_versions()

INSTALLED VERSIONS ------------------ python: 3.9.16.final.0 python-bits: 64 OS: Darwin OS-release: 22.4.0 machine: arm64 processor: arm LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

chemcoord: 2.0.5
numpy: 1.24.3
scipy: 1.10.1
pandas: 1.5.3
numba: 0.57.0
sortedcontainers: 2.4.0
sympy: 1.11.1
pytest: 7.3.1
pip: 21.3
setuptools: None
IPython: 8.2.0
sphinx: 4.5.0

Duplicates #72 actually.

Happy to help fix this one, since it would be nice to use future versions of pandas.

Dear Professor Hutchison,

Thank you very much for this report.
Good note to myself to fix already the warnings when they pop up.

I know that I did not rely on any order when using set.
I chose the datatype because it should self-document that the elements form indeed a set; i.e. I did not care about order, they are unique and I often do intersections and complements.
Might even benefit performance.

The obvious quick-fix is to just cast to a list or tuple when needed, but that seems a bit ugly to my eyes.
I have to think about it a bit and am happy to hear cleaner ideas.

A possible solution could be to use numpy set operations. More on this at https://numpy.org/doc/stable/reference/routines.set.html . I'm working on this solution, could you tell in what places and how you have used sets?

Sets will be faster at taking intersection and complement than numpy arrays. So efficiency-wise the best solution will be a hybrid of two.