indygreg/PyOxidizer

importlib_metadata.PackageNotFoundError: importlib_metadata

jayvdb opened this issue · 10 comments

When building a package including importlib_metadata, it fails.

During the build

...

adding embedded resource: changelog.rst
adding embedded resource: index.rst
adding embedded resource: using.rst
adding embedded resource: example-21.12-py3-none-any.whl
adding embedded resource: PKG-INFO
adding embedded resource: SOURCES.txt
adding embedded resource: dependency_links.txt
adding embedded resource: installed-files.txt
adding embedded resource: requires.txt
adding embedded resource: top_level.txt
adding embedded resource: PKG-INFO
adding embedded resource: SOURCES.txt
adding embedded resource: dependency_links.txt
adding embedded resource: installed-files.txt
adding embedded resource: top_level.txt
adding embedded resource: PKG-INFO
adding embedded resource: SOURCES.txt
adding embedded resource: dependency_links.txt
adding embedded resource: installed-files.txt
adding embedded resource: requires.txt
adding embedded resource: top_level.txt
...
package importlib_metadata-0.23-py3.7.egg-info does not exist; excluding resources: ["PKG-INFO", "SOURCES.txt", "dependency_links.txt", "installed-files.txt", "requires.txt", "top_level.txt"]
package more_itertools-7.2.0-py3.7.egg-info does not exist; excluding resources: ["PKG-INFO", "SOURCES.txt", "dependency_links.txt", "installed-files.txt", "top_level.txt"]
package zipp-0.6.0-py3.7.egg-info does not exist; excluding resources: ["PKG-INFO", "SOURCES.txt", "dependency_links.txt", "installed-files.txt", "requires.txt", "top_level.txt"]
...
resolved 4 embedded resource files across 2 packages: {
    "importlib_metadata.docs": {
        "changelog.rst",
        "index.rst",
        "using.rst",
    },
    "importlib_metadata.tests.data": {
        "example-21.12-py3-none-any.whl",
    },
}
>>> import importlib_metadata
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "importlib_metadata", line 547, in <module>
  File "importlib_metadata", line 509, in version
  File "importlib_metadata", line 482, in distribution
  File "importlib_metadata", line 187, in from_name
importlib_metadata.PackageNotFoundError: importlib_metadata

Could this be because its setup.py is

from setuptools import setup

setup(use_scm_version=True)

And setup.cfg includes setup_requires=setuptools-scm.

I tried adding setuptools-scm to the project first, and then it is added to the list

package importlib_metadata-0.23-py3.7.egg-info does not exist; excluding resources: ["PKG-INFO", "SOURCES.txt", "dependency_links.txt", "installed-files.txt", "requires.txt", "top_level.txt"]
package more_itertools-7.2.0-py3.7.egg-info does not exist; excluding resources: ["PKG-INFO", "SOURCES.txt", "dependency_links.txt", "installed-files.txt", "top_level.txt"]
package setuptools_scm-3.3.3-py3.7.egg-info does not exist; excluding resources: ["PKG-INFO", "SOURCES.txt", "dependency_links.txt", "entry_points.txt", "installed-files.txt", "top_level.txt", "zip-safe"]
package zipp-0.6.0-py3.7.egg-info does not exist; excluding resources: ["PKG-INFO", "SOURCES.txt", "dependency_links.txt", "installed-files.txt", "requires.txt", "top_level.txt"]

The comment at

// Ignore EGG-INFO directory, as it is just packaging metadata.
(Ignore EGG-INFO directory..) is possibly related. Packaging metadata is quite important for a lot of Python code to run properly, and likely this is the same underlying problem with pkg_resources.

Packaging metadata is quite important for a lot of Python code to run properly

indeed it is. Entry-points are defined in there, they can be used to load plugins:
https://importlib-metadata.readthedocs.io/en/latest/using.html#entry-points
https://setuptools.readthedocs.io/en/latest/setuptools.html#dynamic-discovery-of-services-and-plugins

likely this is the same underlying problem with pkg_resources

Yes. In fact, pkg_resources also doesn't work with PyOxidizer (see #134), and as the current issue shows also its replacement, importlib_metadata, does not...

What would the recommended workaround be, besides of course not using importlib_metadata or pkg_resources (hence giving up on dynamic plugin discovery)?

this doc https://importlib-metadata.readthedocs.io/en/latest/using.html#extending-the-search-algorithm explains how one could extend importlib.metadata search algorithm for package metadata that is not stored on the fille system.

What this means in practice is that to support finding distribution package metadata in locations other than the file system, you should derive from Distribution and implement the load_metadata() method. Then from your finder, return instances of this derived Distribution in the find_distributions() method.

PyOxidizer could store all metadata for the pip-installed packages somewhere inside the binary, then define a custom DistributionFinder that returns custom Distribution objects that can load the metadata from this custom location, and register this custom finder in sys.meta_path where importlib.metadata searches for custom finders.

Ah! I just discovered that there is already a custom meta path finder PyOxidizerFinder that is automatically registered in sys.meta_path, to support importing modules from memory:
https://github.com/indygreg/PyOxidizer/blob/main/docs/pyembed.rst

So, basically, we would just need to implement a find_distributions method for PyOxidizerFinder that returns a compatible Distribution object that allows to load metadata from memory.
I wish I could help but I'm not very rusty.

Anyway, thanks for this cool project. I hope that you'll consider adding support for this sometime in the future!

Thank you for doing the research on the metadata APIs. It looks like we'll need to teach the built-in meta path importer some new tricks to support this API.

I looked into implementing find_distributions(). However, I ran into enough issues with the documentation that I don't want to sink too much time into it before I get answers from the Python maintainers. I recorded my questions at https://bugs.python.org/issue38594.

Tracing through the Zip & Egg tests may help us answer some of those questions. (looks to be almost identical to cpython's test_zip.py.)

thought-machine/please#764 seems to be tackling the same problem with PEX, and may have a working implementation which passes basic tests - might be good to share knowledge with. (https://github.com/facebook/buck also has some interesting PEX stuff happening internally)

https://github.com/pantsbuild/pex/blob/16cfd970cba2d9d85b966ca4e50cd23a6d024cdf/pex/finders.py and facebookincubator/xar@b2501b0 seems to be re-using the existing pkg_resources.find_distributions rather than creating its own. If I read them correctly, they bundle pkg_resources (and markerlib) as a mandatory component, and then present their internal distributions as wheels/eggs. Maybe PyOxidizer can provide Wheel subclasses as a container for the metadata. The use of wheels in this way doesnt appear to be supported, c.f. pypa/packaging-problems#244 which mentions PyOxidizer.

setuptools "find_distributions" may be relevant.

One of the authors of importlib.metadata appears to have addressed the documentation deficiencies last month.

I just rewrote the format that resources are embedded in binaries and I carved out room to denote package metadata in the embedded resources. It should now be possible to teach the packaging code to discover and include package distribution/metadata files and for the in-memory importer to expose the API so importlib.metadata and friends can read it.

The main branch now has some support for importlib.metadata.

The resource scanning code should now properly recognize files in .dist-info and .egg-info directories as package distribution resource files. These files are exposed to Starlark and can be packaged, just like source and bytecode modules can. The files are encoded within the embedded resources blob, just like source and bytecode modules.

Support for packaging these files should be robust.

The custom module importer now provides a find_distributions() method that returns custom types that describe a distribution. However, they do not yet implement the complete API as defined by importlib.metadata.Distribution. See https://pyoxidizer.readthedocs.io/en/latest/packaging_importer.html#importlib-metadata-compatibility for details.

We could implement the full API. But it would be a lot of work. I'm inclined to get this current PyOxidizer release out the door then require Python 3.8 in the next release. At that point, we can use importlib.metadata.Distribution and implement our custom Distribution much more easily. So we're probably look at version 0.8 for full importlib.metadata support.