Watermark fails to identify (some) packages when imported as 'from X import Y'
adrien-perello opened this issue · 16 comments
Oh yeah, this looks like a bug. Not sure what is different about sympy that causes this to happen. Have to look into this.
Thanks for looking into this. Yeah, it looks weird, and I don't know why sympy only shows up in the list of imports of it is imported directly. Taking a quick look at the package structure at https://github.com/sympy/sympy/tree/master/sympy, it looks like it uses the regular package conventions with having __init__.py
files in the submodules. Or, in other words, based on my brief look, it doesn't look like that SymPy uses some custom hacks for imports
Thanks. I am not sure how to resolve this / if this can be resolved. I am hoping someone might know a solution
The same issue goes also for the lmfit
package, it works when loading the entire package, exactly as described above. The other packages are loaded as checks, notice that for scipy
there is no such problem. I'm using the latest version of watermark v. 2.3.0 (January 3, 2022).
Thanks for the note. I wish I knew a good way to fix that :(
joining the chorus here, as I noticed pytorch_lightning
package having the same issue:
import watermark
from pytorch_lightning import seed_everything
if __name__ == '__main__':
seed_everything(42)
print(watermark.watermark(machine=True, globals_=globals(), iversions=True, python=True))
Thanks for posting. Still don’t know a good solution for this 😢
Another one brought me here:
from halo import HaloNotebook as Halo
But yet from sklearn import metrics
in the demo works despite koaning's example with sklearn.linear_model
.
Still don’t know a good solution for this 😢
I did something similar some time ago: I have extracted the imported modules in the code cells in a jupyter notebook with regular expressions, but I don't know if this a good solution for that issue.
Thanks for sharing though! Hm yeah, I feel like there must be a direct way in Python somehow ...
I think the issue of the import command from module_name import function_name
is not covert by watermark_self.shell.user_ns
or globals_
. I have tested this for the following import statements:
from sklearn.metrics import accuracy_score # as ...
from pandas import DataFrame # as ...
import re # as ...
from os import path # as ...
import scipy.cluster # as ...
and took a closer look into the output of ns
in watermark.py
:
...
if args['iversions']:
if watermark_self:
ns = watermark_self.shell.user_ns
elif globals_:
ns = globals_
else:
raise RuntimeError(
"Either `watermark_self` or `globals_` must be provided "
"to show imported package versions."
)
print(ns) # CHANGE: PRINT OUTPUT
output.append(_get_all_import_versions(ns))
...
Viewing the output of ns
, from my point of view, there is no way to extract the module name directly for the import command from module_name import function_name
because this seems to only import the function (<function ...
), but not the underlying module (<module ...
), see the case for accuracy_score from sklearn. But if you import a submodule from a module like from module_name import submodule_name
or import module_name.submodule_name
, the module will be in the output (<module ...
), see scipy. So, in conclusion, it does not work extracting the module name when importing a function from a module, but it works when importing a submodule from a module.
I have played around a little with the code in watermark.py and changed following things:
- Added regular expressions import
import re
- Updated
_get_all_import_versions(vars)
function with module import expressions
def _get_all_import_versions(vars):
to_print = {}
imported_pkgs = {
val.__name__.split(".")[0]
for val in list(vars.values())
if isinstance(val, types.ModuleType)
}
### CHANGES START
import_pattern = re.compile(r"import\s+([\w\.]+)")
from_pattern = re.compile(r"from\s+([\w\.]+)\s+import")
for code in vars["_ih"]:
import_matches = import_pattern.findall(code)
from_matches = from_pattern.findall(code)
if import_matches:
for match in import_matches:
imported_pkgs.add(match.split(".")[0])
if from_matches:
for match in from_matches:
imported_pkgs.add(match.split(".")[0])
### CHANGES END
imported_pkgs.discard("builtins")
for pkg_name in imported_pkgs:
pkg_version = _get_package_version(pkg_name)
if pkg_version not in ("not installed", "unknown"):
to_print[pkg_name] = pkg_version
return to_print
This is just a showcase, and I still don't know why os
is not in the output (same without the changes). What do you think?