Wrong package versions within virtualenv
mrbell opened this issue · 17 comments
I'm not sure how widespread this issue is, or if it's something particular to my setup, but when I use watermark to report package information (using -p) within a jupyter notebook that is running in a virtualenv, version information about system level packages are reported rather than packages installed within my environment. If a package is installed in my virtualenv and not installed at the system level, then I end up getting DistributionNotFound
exceptions when I try to report the version number using watermark.
The problem stems from the call to pkg_resources.get_distribution
. If I use get_distribution
directly from within my notebook to report the version of e.g. numpy
it shows my system level numpy info (v 1.11.0) rather than info about numpy installed within my virtualenv (v 1.11.1). For example:
pkg_resources.get_distribution('numpy')
numpy 1.11.0 (/usr/local/lib/python2.7/site-packages)
Similarly
%watermark -p numpy
numpy 1.11.0
But when I check the version of what is imported:
import numpy as np
np.__version__
'1.11.1'
If I use pkg_resources
from a python console or script running within a virtualenv, it reports the proper information about packages. But if I use it from an ipython console or jupyter notebook it reports system level information.
If this is a widespread issue, should watermark
find another more robust solution to getting package information other than using pkg_resources
?
Thanks for the info; it's definitely worth looking into it!
It works just fine for me via conda, so maybe just a few questions before we dig deeper into this issue ... ;)
- Have you installed the latest watermark (1.3.1), the one that is installed as a normal package now instead of using
%install_ext
? Have you had the latter installed previously, and are the maybe some leftovers from this installation in you ipython extension directory? - have you checked that you are running the virtual env's jupyter notebook, e.g., by a
which jupyter
on the command line andimport sys; sys.argv
inside the notebook?
If this is a widespread issue, should watermark find another more robust solution to getting package information other than using pkg_resources?
I am definitely open to improvements if this is indeed a "bug" in this case. Do you have any recommendations? Ideally something that doesn't require external dependencies?
Thanks for the quick response!
- I have installed the latest watermark and have not had previous versions installed.
- I do not have jupyter installed in my environment.
which jupyter
gives/usr/local/bin/jupyter
I ran some tests in fresh environments both with and with jupyter installed in the environment. The test fails without jupyter installed (as reported) but passes when jupyter is installed.
So installing jupyter within the environment fixes the issue. Maybe that's the right thing to do anyway? I haven't ever done it before, though. It's probably worth noting this as a requirement in the watermark documentation. Or maybe adding jupyter as a requirement that is installed along side watermark?
An aside on the ipython console:
I don't know if people use watermark from the ipython console, but if so, simply installing ipython in the virtualenv may not work if the system ipython has been run previously within the virtualenv. Here is an example:
- Create and activate a fresh virtualenv, do not install ipython in the virtualenv
- Launch the ipython console
- the system ipython will be called (but like jupyter it recognizes that you're in a virtualenv so typical imports of virtualenv installed packages are fine)
- Exit ipython
- Now
pip install watermark
, which installs ipython which ipython
will report the virtualenv ipython executable. So we're good, right?- Launch an ipython console
import sys; sys.argv
shows that the system ipython console is still being usedwatermark
/pkg_resources
refer to system level packages, not those in the virtualenv
So installing jupyter within the environment fixes the issue. Maybe that's the right thing to do anyway?
Hm, I usually install ipython/jupyter notebook as well in virtual environments, but maybe that's not a "normal" thing to do ;).
Just to summarize your points to make sure that I understand it correctly:
- if you run the system jupyter notebook in a virtual environment, watermarks gets version numbers from system python packages
- if you run the virtualenv's jupyter notebook in a virtual environment, it watermarks gets version numbers from virtualenv python packages
But you are saying that if you run the system's jupyter notebook in a virtual environment, it will import the virtual environment's packages, not the system python packages, right?
So, I guess what we want watermark to report the package version of the package that the jupyter notebook actually imports, no matter whether it's a system or virtual environment notebook. How to do this elegantly I don't know, yet. Hopefully, there's a way to achieve this via pkg_resources.get_distribution
, I will look a bit around in hope to find a good solution. And please let me know if you have any solution or ideas! :).
But you are saying that if you run the system's jupyter notebook in a virtual environment, it will import the virtual environment's packages, not the system python packages, right?
Right.
we want watermark to report the package version of the package that the jupyter notebook actually imports, no matter whether it's a system or virtual environment notebook
This is the behavior I would expect.
I will give it some thought and let you know if I have any bright ideas.
Unfortunately, I don't have a solution at hand (yet), but I just checked in via conda (running the non-virtual environment jupyter notebook in an active virtual conda environment, and it imports the "correct" package, that is, the package from the active virtual environment. So my guess is it's somehow an issue that is related to virtualenv
specfically, maybe how they handle paths. I also found a related issue on SO (but without solution unfortunately).
Reporting the same package info that is available via pip freeze
should work, right? The installed distributions can be retrieved in code via the pip
package (see e.g. this SO answer)
So that would add pip as a dependency (which seems reasonable), but then _get_packages
could look something like:
if self.out:
self.out += '\n'
packages = pkgs.split(',')
installed_dists = {i.key: i.version for i in pip.get_installed_distributions()}
for p in packages:
self.out += '\n%s %s' % (p, installed_dists[p])
Yeah that could work. Plus I think that an additional flag --list_all_pkg
or so could be added to watermark (would not recommend it as a default though since it may be a long list of packages not relevant to the project in certain cases).
Have you checked if this grabs the correct numpy version in your system jupyter nb, i.e., the virtualenv one?
Let me know if you'd like to submit it as a pull request, otherwise, I am happy to give it at try on the weekend.
I think it would also be worthwhile posting this issue on the virtualenv issue tracker -- I think this is likely not desired behavior (e.g., since pkg_resources.get_distribution
works fine in conda's virtual environment).
Yes, this grabs the correct version for me when I use the system jupyter notebook.
I'll submit a PR later this evening.
Alternatively, we could use something like
import sys
module_names = ['numpy']
for m in module_names:
if hasattr(sys.modules[m], '__version__'):
print(m, sys.modules[m].__version__)
I just see that pip is also using pkg_resources
def get_installed_distributions(local_only=True,
skip=stdlib_pkgs,
include_editables=True,
editables_only=False,
user_only=False):
"""
Return a list of installed Distribution objects.
If ``local_only`` is True (default), only return installations
local to the current virtualenv, if in a virtualenv.
``skip`` argument is an iterable of lower-case project names to
ignore; defaults to stdlib_pkgs
If ``editables`` is False, don't report editables.
If ``editables_only`` is True , only report editables.
If ``user_only`` is True , only report installations in the user
site directory.
"""
if local_only:
local_test = dist_is_local
else:
def local_test(d):
return True
if include_editables:
def editable_test(d):
return True
else:
def editable_test(d):
return not dist_is_editable(d)
if editables_only:
def editables_only_test(d):
return dist_is_editable(d)
else:
def editables_only_test(d):
return True
if user_only:
user_test = dist_in_usersite
else:
def user_test(d):
return True
return [d for d in pkg_resources.working_set
if local_test(d) and
d.key not in skip and
editable_test(d) and
editables_only_test(d) and
user_test(d)
]
I would really prefer a solution without adding additional dependencies here if possible
The sys.modules solution works for me. I'll submit a new PR dropping the pip requirement.
I also have a separate issue (must be 2.7 specific?) using the -w flag. __version__
is called without being referenced to list the watermark version. The fix that I included in PR #16 doesn't work in python 3 apparently. How about adding a _version.py
file with the version string in it and that can be referenced from both __init__.py
and watermark.py
?
Thanks for the PR!
The sys.modules solution works for me. I'll submit a new PR dropping the pip requirement.
Yeah, I think that would be a leaner solution!
I would extend it a bit to cover some edge cases that don't have a __version__
attribute. I think I've seen some older ones with .version
import sys
module_names = ['numpy']
for m in module_names:
s += '%s ' % m
if hasattr(sys.modules[m], '__version__'):
s += sys.modules[m].__version__
elif hasattr(sys.modules[m], 'version'):
s += sys.modules[m].version
else:
s += 'n/a'
s = s.rstrip()
I was just suggesting something like that regarding the version :-)
How do you feel about throwing a custom exception if m
is not in sys.modules
? As it stands a KeyError
would be thrown and that may not too helpful for many people.
How do you feel about throwing a custom exception if m is not in sys.modules? As it stands a KeyError would be thrown and that may not too helpful for many people.
Yeah, great idea!
btw. don't feel rushed with implementing everything, I am probably not able to take a close look at the PR before the weekend!
Oooops, my sys.modules proposal was a bit naive as I just figured out after the pull request. It worked fine first, but then it didn't. So, this method currently only works if the module has been imported _before_ calling watermark. E.g., if watermark is the first cell in a notebook, it won't find the packages. I guess we need to fix that again ... :(
Let's not rush it though and think about some better options (and if nothing else works, just using pip instead)
PS: Reminds me that I want to improve the CI at some point #14
Sorry, didn't mean to rush. I just had time last night and thought I'd take the opportunity to try to take care of it.
Here is an admittedly hacky solution but it works:
for p in packages:
version_str = subprocess.check_output(
'''python -c "import pkg_resources as pr; print(pr.get_distribution('{:}'))"'''.format(p),
shell=True
).rstrip().split()[1]
self.out += '\n%s %s' % (p, version_str)
This calls the virtualenv python, so it returns the correct package info.
I tried combining the for loop into a one-liner call to subprocess but for whatever reason couldn't get it to work, e.g.
subprocess.check_output(
'''python -c "import pkg_resources as pr; print([pr.get_distribution(m) for m in [{:}]])"'''.format(
"'{:}'".format("','".join(packages))
),
shell=True
)
No worries! Instead of using subprocess, wouldn't a simpler __import__('numpy')
work?
__import__('numpy')
m, sys.modules['numpy'].__version__
...
EDIT:
Alternatively,
>>> import importlib
>>> importlib.import_module("numpy").__version__
'1.11.1'
would work as well.But I am not sure if there's an advantage vs __import__
>>> __import__('numpy').__version__
'1.11.1'
What do you think about something like:
packages = ['numpy']
for p in packages:
imported = __import__(p)
# will raise an ImportError: No module named p
# so we could remove the PackageNotFoundError(Exception)
# I guess
try:
ver = imported.__version__
except AttributeError:
try:
ver = imported.version
except AttributeError:
try:
ver = imported.version_info
except AttributeError:
ver = 'n\a'
print(ver)
I think this could work if I am not overlooking something :)