rasbt/watermark

Watermark fails to identify (some) packages when imported as 'from X import Y'

adrien-perello opened this issue · 16 comments

See the jupyter notebook outputs below:

Screenshot from 2021-04-09 12-46-50

rasbt commented

Oh yeah, this looks like a bug. Not sure what is different about sympy that causes this to happen. Have to look into this.

It seems like watermark is not the only one to fail at detecting the sympy module when doing from X import Y. Could it be that the problem comes from the sympy module itself?

Firefox_Screenshot_2021-04-12T08-06-54 553Z

rasbt commented

Thanks for looking into this. Yeah, it looks weird, and I don't know why sympy only shows up in the list of imports of it is imported directly. Taking a quick look at the package structure at https://github.com/sympy/sympy/tree/master/sympy, it looks like it uses the regular package conventions with having __init__.py files in the submodules. Or, in other words, based on my brief look, it doesn't look like that SymPy uses some custom hacks for imports

It seems like scikit-learn also has this issue.

image

rasbt commented

Thanks. I am not sure how to resolve this / if this can be resolved. I am hoping someone might know a solution

The same issue goes also for the lmfit package, it works when loading the entire package, exactly as described above. The other packages are loaded as checks, notice that for scipy there is no such problem. I'm using the latest version of watermark v. 2.3.0 (January 3, 2022).

image

rasbt commented

Thanks for the note. I wish I knew a good way to fix that :(

delip commented

joining the chorus here, as I noticed pytorch_lightning package having the same issue:

import watermark
from pytorch_lightning import seed_everything

if __name__ == '__main__':
    seed_everything(42)
    print(watermark.watermark(machine=True, globals_=globals(), iversions=True, python=True))
rasbt commented

Thanks for posting. Still don’t know a good solution for this 😢

Another one brought me here:

from halo import HaloNotebook as Halo

But yet from sklearn import metrics in the demo works despite koaning's example with sklearn.linear_model.

Still don’t know a good solution for this 😢

I did something similar some time ago: I have extracted the imported modules in the code cells in a jupyter notebook with regular expressions, but I don't know if this a good solution for that issue.

rasbt commented

Thanks for sharing though! Hm yeah, I feel like there must be a direct way in Python somehow ...

I think the issue of the import command from module_name import function_name is not covert by watermark_self.shell.user_ns or globals_. I have tested this for the following import statements:

from sklearn.metrics import accuracy_score # as ...
from pandas import DataFrame # as ...
import re # as ...
from os import path # as ...
import scipy.cluster # as ...

and took a closer look into the output of ns in watermark.py:

...
if args['iversions']:
            if watermark_self:
                ns = watermark_self.shell.user_ns
            elif globals_:
                ns = globals_
            else:
                raise RuntimeError(
                    "Either `watermark_self` or `globals_` must be provided "
                    "to show imported package versions."
                )
            print(ns) # CHANGE: PRINT OUTPUT
            output.append(_get_all_import_versions(ns))
...

Viewing the output of ns, from my point of view, there is no way to extract the module name directly for the import command from module_name import function_name because this seems to only import the function (<function ...), but not the underlying module (<module ...), see the case for accuracy_score from sklearn. But if you import a submodule from a module like from module_name import submodule_name or import module_name.submodule_name, the module will be in the output (<module ...), see scipy. So, in conclusion, it does not work extracting the module name when importing a function from a module, but it works when importing a submodule from a module.

I have played around a little with the code in watermark.py and changed following things:

  1. Added regular expressions import
import re
  1. Updated _get_all_import_versions(vars) function with module import expressions
def _get_all_import_versions(vars):
    to_print = {}
    imported_pkgs = {
        val.__name__.split(".")[0]
        for val in list(vars.values())
        if isinstance(val, types.ModuleType)
    }

    ### CHANGES START
    import_pattern = re.compile(r"import\s+([\w\.]+)")
    from_pattern = re.compile(r"from\s+([\w\.]+)\s+import")

    for code in vars["_ih"]:
        import_matches = import_pattern.findall(code)
        from_matches = from_pattern.findall(code)

        
        if import_matches:
            for match in import_matches:
                imported_pkgs.add(match.split(".")[0])
                
        if from_matches:
            for match in from_matches:
                imported_pkgs.add(match.split(".")[0])
    ### CHANGES END    

    imported_pkgs.discard("builtins")

    for pkg_name in imported_pkgs:
        pkg_version = _get_package_version(pkg_name)
        if pkg_version not in ("not installed", "unknown"):
            to_print[pkg_name] = pkg_version
    return to_print

This is just a showcase, and I still don't know why os is not in the output (same without the changes). What do you think?