pdoc3/pdoc

Error if an extensition has been compiled for several python version.

nennigb opened this issue · 6 comments

Expected Behavior

I am documenting a package containing fortran extension build with f2py/gfortran/gcc on linux.
Once the package is installed the root folder contains mypkg.cpython-39-x86_64-linux-gnu.so which is imported in the __init__.py file.
If I run pdoc, it works fine ;-)

# Here I use py39
pdoc3 --force --config latex_math=True --html pypolsys
/home/xyz/anaconda3/envs/work/lib/python3.9/site-packages/pdoc/__init__.py:754: UserWarning: Couldn't read PEP-224 variable docstrings from <Module 'pypolsys.polsys'>: source code not available
  m = Module(import_module(fullname),
html/pypolsys/index.html
html/pypolsys/polsys.html
html/pypolsys/test.html
html/pypolsys/utils.html
html/pypolsys/version.html

I am using several python version (the package are installed in editable mode) and I can have also mypkg.cpython-38-x86_64-linux-gnu.so or mypkg.cpython-310-x86_64-linux-gnu.so in the same place. Then pdoc crashes.

# Here I use py39
pdoc3 --force --config latex_math=True --html pypolsys
/home/xyz/anaconda3/envs/work/lib/python3.9/site-packages/pdoc/__init__.py:754: UserWarning: Couldn't read PEP-224 variable docstrings from <Module 'pypolsys.polsys'>: source code not available
  m = Module(import_module(fullname),
Traceback (most recent call last):
  File "/home/xyz/anaconda3/envs/work/lib/python3.9/site-packages/pdoc/__init__.py", line 222, in import_module
    module = importlib.import_module(module_path)
  File "/home/xyz/anaconda3/envs/work/lib/python3.9/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1030, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
  File "<frozen importlib._bootstrap>", line 981, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'pypolsys.polsys.cpython-310-x86_64-linux-gnu'; 'pypolsys.polsys' is not a package

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/xyz/anaconda3/envs/work/bin/pdoc3", line 8, in <module>
    sys.exit(main())
  File "/home/xyz/anaconda3/envs/work/lib/python3.9/site-packages/pdoc/cli.py", line 534, in main
    modules = [pdoc.Module(module, docfilter=docfilter,
  File "/home/xyz/anaconda3/envs/work/lib/python3.9/site-packages/pdoc/cli.py", line 534, in <listcomp>
    modules = [pdoc.Module(module, docfilter=docfilter,
  File "/home/xyz/anaconda3/envs/work/lib/python3.9/site-packages/pdoc/__init__.py", line 754, in __init__
    m = Module(import_module(fullname),
  File "/home/xyz/anaconda3/envs/work/lib/python3.9/site-packages/pdoc/__init__.py", line 224, in import_module
    raise ImportError(f'Error importing {module!r}: {e.__class__.__name__}: {e}')
ImportError: Error importing 'pypolsys.polsys.cpython-310-x86_64-linux-gnu': ModuleNotFoundError: No module named 'pypolsys.polsys.cpython-310-x86_64-linux-gnu'; 'pypolsys.polsys' is not a package

It is noteworthy that arbitrary .so file like abcd.so is also a problem.
We could expect that pdoc ignore such additional .so file (filter by python version, just pick one, ...)

Steps to Reproduce

  1. Pick a python package with fortran extension
  2. Copy and rename the .so with another python version to add a second .so file
  3. Call pdoc3 mypkg

Additional info

  • pdoc version:
    pdoc 0.10.0
kernc commented

Since you seem to have already investigated it some, care to propose a fix here:

pdoc/pdoc/__init__.py

Lines 196 to 235 in 2cce30a

def import_module(module: Union[str, ModuleType],
*, reload: bool = False) -> ModuleType:
"""
Return module object matching `module` specification (either a python
module path or a filesystem path to file/directory).
"""
@contextmanager
def _module_path(module):
from os.path import abspath, dirname, isfile, isdir, split
path = '_pdoc_dummy_nonexistent'
module_name = inspect.getmodulename(module)
if isdir(module):
path, module = split(abspath(module))
elif isfile(module) and module_name:
path, module = dirname(abspath(module)), module_name
try:
sys.path.insert(0, path)
yield module
finally:
sys.path.remove(path)
if isinstance(module, Module):
module = module.obj
if isinstance(module, str):
with _module_path(module) as module_path:
try:
module = importlib.import_module(module_path)
except Exception as e:
raise ImportError(f'Error importing {module!r}: {e.__class__.__name__}: {e}')
assert inspect.ismodule(module)
# If this is pdoc itself, return without reloading. Otherwise later
# `isinstance(..., pdoc.Doc)` calls won't work correctly.
if reload and not module.__name__.startswith(__name__):
module = importlib.reload(module)
# We recursively reload all submodules, in case __all_ is used - cf. issue #264
for mod_key, mod in list(sys.modules.items()):
if mod_key.startswith(module.__name__):
importlib.reload(mod)
return module

or here:

pdoc/pdoc/__init__.py

Lines 719 to 768 in 2cce30a

# If the module is a package, scan the directory for submodules
if self.is_package:
def iter_modules(paths):
"""
Custom implementation of `pkgutil.iter_modules()`
because that one doesn't play well with namespace packages.
See: https://github.com/pypa/setuptools/issues/83
"""
from os.path import isdir, join
for pth in paths:
for file in os.listdir(pth):
if file.startswith(('.', '__pycache__', '__init__.py')):
continue
module_name = inspect.getmodulename(file)
if module_name:
yield module_name
if isdir(join(pth, file)) and '.' not in file:
yield file
for root in iter_modules(self.obj.__path__):
# Ignore if this module was already doc'd.
if root in self.doc:
continue
# Ignore if it isn't exported
if not _is_public(root) and not _is_whitelisted(root, self):
continue
if _is_blacklisted(root, self):
self._skipped_submodules.add(root)
continue
assert self.refname == self.name
fullname = f"{self.name}.{root}"
try:
m = Module(import_module(fullname),
docfilter=docfilter, supermodule=self,
context=self._context, skip_errors=skip_errors)
except Exception as ex:
if skip_errors:
warn(str(ex), Module.ImportWarning)
continue
raise
self.doc[root] = m
# Skip empty namespace packages because they may
# as well be other auxiliary directories
if m.is_namespace and not m.doc:
del self.doc[root]
self._context.pop(m.refname)

?

kernc commented

It looks like something in iter_modules(). inspect.getmodulename(file) recognizes those .so files as valid modules, so the function offers them for importing ... 🤔

yes, module_name = inspect.getmodulename(file) returns a module name
in this function there is a list of all possible suffix given by importlib.machinery.all_suffixes()

['.py', '.pyc', '.cpython-39-x86_64-linux-gnu.so', '.abi3.so', '.so']

when processing the true extension, the suffix '.cpython-39-x86_64-linux-gnu.so' is detected and remove, but when we have the wrong python version, the '.so' is caught and only the '.so' is removed leading to a wrong module name...

Perhaps we may add a extra check to see if the module_name is valid. For instance here it contains .. Not sure if it should be always true at this stage although it is a loop module files...
Another approach is to silent the path import error while keeping invalid python module import error...

I also made a mistake since an arbitrary abcd.so also raise an error (the file was originality in the wrong place :-( I edit my previous message).

kernc commented

I'd rather something more strict like:

if '.cpython-' in module_name:
    continue

PR welcome.
I see with --skip-errors CLI switch, one can already turn this into a warning.

pdoc/pdoc/__init__.py

Lines 753 to 761 in 2cce30a

try:
m = Module(import_module(fullname),
docfilter=docfilter, supermodule=self,
context=self._context, skip_errors=skip_errors)
except Exception as ex:
if skip_errors:
warn(str(ex), Module.ImportWarning)
continue
raise

kernc commented

Can you show example of an error with abcd.so file?

if '.cpython-' in module_name:
   continue

Agree, it will strongly limit the side effect, but I not sure if it will work on other platform. I will have a look.

If abcd.so is an empty file, the error is

    raise ImportError(f'Error importing {module!r}: {e.__class__.__name__}: {e}')
ImportError: Error importing 'pypolsys.abcd': ImportError: /home/xyz/Recherche/CODE/polsysplp/pypolsys/abcd.so: file too short

if abscd.so is a copy of a valid module extension with bad naming convention, the error is

    raise ImportError(f'Error importing {module!r}: {e.__class__.__name__}: {e}')
ImportError: Error importing 'pypolsys.abcd': ImportError: dynamic module does not define module export function (PyInit_abcd)

if the so file is a valid, non extension so file

    raise ImportError(f'Error importing {module!r}: {e.__class__.__name__}: {e}')
ImportError: Error importing 'pypolsys.libstdc++': ModuleNotFoundError: No module named 'pypolsys.libstdc++'

I will be happy to contribute.

ps : I miss the --skip-errors switch !