indygreg/PyOxidizer

Stop requiring `__file__` in Python packages

indygreg opened this issue ยท 17 comments

The Problem

Many Python modules and scripts use __file__ to derive the filesystem path to the current file.

As documented at https://docs.python.org/3/reference/datamodel.html (search for __file__), __file__ is optional (the __file__ attribute may be missing for certain types of modules).

However, because Python has traditionally relied on filesystem-based imports and hasn't had a stable story around non-module resource handling, , __file__ is almost always defined and has been used to locate and load files next to Python source files for seemingly forever. This is arguably tolerable. But reliance on __file__ undermines tools - like PyOxidizer - which don't import Python modules from the filesystem. This in turn constrains the flexibility and utility of the larger Python ecosystem.

The Solution

Python code should be rewritten to not assume the existence of __file__. By doing so, Python code will be more compatible with more Python execution environments (such as PyOxidizer), and this benefits the overall Python ecosystem.

Instructions for writing portable Python code that doesn't rely on __file__ can be found at https://pyoxidizer.readthedocs.io/en/latest/packaging_pitfalls.html#reliance-on-file.

This Issue

This issue can serve as a focal point for tracking and coordinating Python packages and tools which currently rely on __file__ but shouldn't. If you file a GitHub issue against a project that relies on __file__, you can reference this issue by typing indygreg/PyOxidizer#69 and provide Python project maintainers with enough context to make informed decisions about the use of __file__ in their projects.

I am having a similar issue, just when I am trying to import a package, for example numpy. So I am not sure how to change my code not to depend on file as the only line of code I have is "import numpy"

I have this issue when I use the repl mode, the eval mode, and when I try to run a script

Is there a recommended strategy for patching these modules locally to try and get them to work with PyOxidizer? Currently I am unable to find a way to build when I use a module that contains the __file__ variable.

I have set up a venv, downloaded all the packages that I am using locally, and have tried to monkey patch the __file__ variables where I am hitting errors, but it seems like pyoxidize run is using a cached version of the modules somewhere (unless they get pulled directly from pip each time) since the same error is still showing even though I have removed the __file__ variables.

Is it possible to somehow search and replace the __file__ variable when it is found? I've only just started using this module but it seems like many python packages are probably using it as well. Expecting them to accept pull requests / updates to deprecate this in the near future doesn't really seem all that feasible.

Ok, after doing some more digging around the repo, I found the solution offered for the black example. I didn't see this earlier so I didn't know it existed. The package I'm using is GitPython, so what I did was add:

[[embedded_python_config]]
sys_paths = ["$ORIGIN/lib"]

[[packaging_rule]]
type = "pip-install-simple"
package = "gitpython"
install_location = "app-relative:lib"

Now, the python application that I am developing lives in a virtualenv, so originally I was just using that virtualenv to dictate what packages needed to be rolled up. Is there an equivalent command for install_location = "app-relative:lib" and sys_paths = ["$ORIGIN/lib"] to get this to work with a virtualenv?

it seems like pyoxidize run is using a cached version of the modules somewhere

Try making a minor (e.g. whitespace) change to the toml file

Running a file with python -i and looking through its globals, I noted that __spec__.origin is the same as __file__, but when I made a test package that prints its value into a PyOxidizer executable, it turns into None. (At least the file runs.)

Having a similar problem, PyInstaller uses this workaround:

A Python error trace will point to the source file from which the archive entry was created (the __file__ attribute from the time the .pyc was compiled, captured and saved in the archive). This will not tell your user anything useful, but if they send you a Python error trace, you can make sense of it.

Because, in the user mind, if a tool cannot package a script that was working if not packaged, is a tool fault and not a "that library is accessing an optional attribute without checking".

It would be nice if there was the ability to specify in the toml how __file__ should be handled, choosing between various workarounds that other similar tools use, and ideally choose the workaround on a per-package level.

Workaround strategies would include:

  1. use garbage (this would be enough for pytz c.f. #91 which falls back to pkg_resources but that depends on fixing #134)
  2. use built name (i.e. the exe name; this is good enough for packages who use __file__ as part of some printout, which might be enough for sentry/raven c.f. #63) I believe this is sort-of what Nuitka does, at least for its compiled modules
  3. use source filename (e.g. like PyInstaller)

Likely others exist too.

For any maintainer of a package which has been directed to read this issue for PyOxidizer compatibility, if your use of __file__ is to load package data, the following alternative ways to load data are also not supported:

  • pkg_resources #134
  • pkgutil.get_data #139
  • importlib_metadata #140

It would be good to know what standardised method of loading package data does work.

(update: #53 suggests that importlib.resource and backport importlib_resources is supposed to work. I saw elsewhere that Greg raised https://bugs.python.org/issue36128 about that. And #128 suggests the support is currently buggy)

Why not implement a wrapper so that at compile time, __file__ gets replaced with a variable referenced via a data hash with the contents being a byte array?

I'm not sure I see how this helps? __file__ should be a filesystem path and unless we're going to do some interesting shenanigans with open() then we're somewhat limited with what we can achieve there. Switching to the ResourceReader approach (which is what the certifi.contents() change does, for 3.7+) solves the problem in a fairly neat way.

Instructions for writing portable Python code that doesn't rely on file can be found at https://pyoxidizer.readthedocs.io/en/latest/packaging_pitfalls.html#reliance-on-file.

Doesn't seem to be there any more.

For future reference, the pinned link to the reliance on file document (the doc is missing in v0.7.0)

Why it have alot of problem? it need installation of tons of programs than don't work properly

So, I think I'll put this here even though I think it also goes in #73.

certifi implemented what was supposedly a fix for __file__ issues here. I know @indygreg prompted them about this a bit in this issue.

But, I don't think the fix works for pyoxidizer.

I'm still getting what I think is basically related to __file__ issues that they had originally.

File "<stdin>", line 1, in <module>
  File "main", line 8, in <module>
  File "httpx", line 2, in <module>
  File "httpx._api", line 3, in <module>
  File "httpx._client", line 11, in <module>
  File "httpx._config", line 54, in <module>
  File "httpx._config", line 59, in SSLConfig
  File "certifi.core", line 37, in where
  File "contextlib", line 112, in __enter__
  File "importlib.resources", line 196, in path
  File "pathlib", line 1022, in __new__
  File "pathlib", line 669, in _from_parts
  File "pathlib", line 653, in _parse_args
TypeError: expected str, bytes or os.PathLike object, not NoneType

I don't understand the intricacies of importlib.resources well enough to understand what is happening here, but somehow package.__spec__.origin is None.

Is this a pyoxidizer problem or a problem with certifi's usage of importlib.resources?

same issue here, cant really import anything beyond stdlib because of the error ```TypeError: expected str, bytes or os.PathLike object, not NoneType