DerwenAI/kglab

Gotcha: numpy version conflict if installing in existing environment with tensorflow 2.4.1

CatChenal opened this issue · 4 comments

Problem:
When installing kglab using pip in an existing (activated) environment, the latest version of numpy is installed (because requirements.txt includes 'numpy >= 1.19.4'). This may create conflicts with other packages.

Specific Case: latest numpy version and tensorflow 2.4.1 version conflict:
My activated env contains tensorflow 2.4.1.
Near the end of the installation process from pip install kglab, I got this error message (abbreviated):

[...]
Installing collected packages: 
[...], kglab
  Attempting uninstall: numpy
    Found existing installation: numpy 1.19.2
    Uninstalling numpy-1.19.2:
      Successfully uninstalled numpy-1.19.2
**ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
  tensorflow 2.4.1 requires numpy~=1.19.2, but you have numpy 1.20.2 which is incompatible.**
Successfully installed [all needed]

My fix:

  1. pip uninstall numpy
  2. pip install numpy==1.19.4

My (minimal) tests:

Suggestion/Question:
Perhaps changing the numpy requirement from 'numpy >= 1.19.4' to 'numpy == 1.19.4' would force pip to install this first compatible version instead of the latest?

Thank you @CatChenal !

Yes, I've seen a related problem in my Ray tutorials where is TF causing issues with the later versions of NumPy (1.20.x)

That's the best workaround that I could see, too.

For dependencies, we prefer to pin the versions using ranges.
Would it help if we pinned to >= 1.19, < 1.20 for now?

In general I'm reluctant to place an upper bounds, since some people don't use TF and they need the latest NumPy for other integration purposes. Plus, I suspect that TF will catch up, eventually. Pandas and Arrow have some similar issues w.r.t. RAPIDS, although the latter is planning to catch up in the next release.

Also, I'll added a note in the (upcoming) FAQ

Thanks @ceteri.

Would it help if we pinned to >= 1.19, < 1.20 for now?

I would hold off for the moment:

  1. My 'suggestion' should have been just a question (my bad!).
  2. I was a bit too hasty in installing a brand new package in an existing environment (end user problem).
  3. For my specific TensorFlow/Numpy conflict, I now know what the requirements are in the current TF 'REQUIRED_PACKAGES'.
  4. Until I test kglab with numpy==1.19.4 exhaustively my "fix" is just a plausible hypothesis (not even a workaround)!
  5. I assume that fixing the upper bound of the Numpy version would require an audit of all the packages in requirements.txt to get their own Numpy dependency, then use the highest [a]. Is there another way?
    [a]. It seems that pip is installing the highest release (i.e. Numpy 1.20.2, which is 23 days old as of this post): the audit would tell which package needs it (if any).

Perhaps a warning box in README would suffice, e.g.:

WARNING on Installing kglab in an existing environment:
Installing a new package in an existing environment may reveal or create version conflicts. See the requirements of kglab in requirements.txt before you do. A known version conflict is that of Numpy in kglab (>= 1.19.4) and TensorFlow 2+ (~-1.19.2).

Just did a roll back of the NumPy requirement, so this should work fine with >= 1.19.2 now.
Also added your language above as a warning, along with notes about the associated PEP 517 errors that may come up.

Many thanks @CatChenal !