kuelumbus/rdkit-pypi

Rename rdkit-pypi to rdkit; avoid conflicts with conda

Closed this issue ยท 14 comments

Hi there, thanks for building such a great package!

It's currently possible to install rdkit from both conda and pip simultaneously. I understand that you may not have access to rdkit in PyPI (so simply changing name in setup.py may be off the table), but I'm wondering if there's any way to rename the package so it replaces instead of sits alongside the conda version?

Note that I don't think there are import conflicts here; the package still gets installed to the same place. But the package managers could show different things and cause uncertainty about what is actually imported. For example, I installed rdkit-pypi (2021.9.5.1) and then conda rdkit (2021.03.5).

pip list shows:

rdkit-pypi                        2021.9.5.1

while conda list has two entries:

rdkit                     2021.03.5        py39h88273a1_0    conda-forge
rdkit-pypi                2021.9.5.1               pypi_0    pypi

and rdkit.__version__ is '2021.03.5'

This is a good point. I believe the best solution is to get access to rdkit on PyPi. I believe the maintainer is Greg (from the RDKit core team). I will contact him.

I am not sure though if this will solve the issue. I do not have much experience with Conda. Can you please try to install a python package (numpy or so that has the same name) with conda and pip (i.e., the pip in the conda environment). Do they coexist? If so this would not help. I assume Conda always picks the conda-forge version over the PyPi version.

I see similar behavior with numpy, so maybe the best solution is just to avoid using both conda and pip :)

pip list:

numpy       1.22.4

conda list:

numpy                     1.22.4                   pypi_0    pypi
numpy-base                1.22.3           py39h3b1a694_0

and:

>>> np.__version__
'1.22.3'

That said, it would still be nice to be able to pip install rdkit so I think asking @greglandrum about that is a good idea.

@kuelumbus actually I think numpy is a special case due to its complexity. For a simpler package like seaborn there does not appear to be any conflict (although conda shows the wrong version) as there is only a single package name.

@kuelumbus actually I think numpy is a special case due to its complexity. For a simpler package like seaborn there does not appear to be any conflict (although conda shows the wrong version) as there is only a single package name.

Does this mean that even if RDKit would have the same name, it is possible to have a conda and pip version installed in one conda environment? (I assume conda installs packages to a different directory than pip.)

Also, why do you have two rdkit versions installed in one conda environment (or how did this happen)? Is this a standard case conda users end up with?

Does this mean that even if RDKit would have the same name, it is possible to have a conda and pip version installed in one conda environment? (I assume conda installs packages to a different directory than pip.)

No, you'll just have one install. Conda and pip both install packages in .../site-packages/, but they store their metadata separately so the version shown by conda|pip list could be different even though they are pointing to the same source files.

Also, why do you have two rdkit versions installed in one conda environment (or how did this happen)? Is this a standard case conda users end up with?

That's a contrived case for illustration, but it could happen if users install packages that depend on rdkit from both sources.

Thanks for clarifying. I believe the only thing that I can do is to rename the package on PyPi to rdkit. But it seems to me the best solution is to use either conda or pip and don't mix them.

That said, it would still be nice to be able to pip install rdkit so I think asking @greglandrum about that is a good idea.

@kuelumbus I am happy to do this; send me email and we can figure out how to do the transfer

Ha! I see you have already done so. :-)

C-nit commented

Came here to suggest using rdkit as distribution name for the wheel releases. I'm super happy to see it happening :)

Developers being able to do packaging depending simply on rdkit right away set's them up for a future where the conda and pip installs are properly aware of each other:

As @skearnes was saying with conda and pip install to the same location in site-packages. This seems weird but if you consider the second install as an update it makes sense. The problem is that the tools might not be properly aware of the existing version.

From conda side there is a configuration that will fix this with the new distribution name: https://docs.conda.io/projects/conda/en/latest/user-guide/configuration/pip-interoperability.html

However most of the time the pip installs come after the conda, and here the problem is coming from rdkit not installing the package properly. It's missing distribution info. Long story short the problem is that after the conda install alone, pip list doesn't show it.

I came across the issue with conda+pip before and commented on the situation in the conversation on an older PR that was about packaging: rdkit/rdkit#2690 (comment)

TL;DR
Sorry for the long message and back seat commenting. I just hope to make all aware that moving the pypi name is really great but only half the fix. Until the rdkit source install (conda) is not fixed, pip will continue to always overwrite.

I just uploaded the recent RDKit version to https://pypi.org/project/rdkit/. You should now be able to install RDKit using

pip install rdkit

I am planning to keep the repos rdkit-pypi and rdkit at PyPi in sync for some time but retire rdkit-pypi in the future.

awesome. Thanks @kuelumbus

Thanks @kuelumbus and @greglandrum! I'll send a PR soon to update the RDKit installation docs.

Ah I see @kuelumbus already submitted a PR (rdkit/rdkit#5373); thanks!