pypa/setuptools

[BUG] recent versions of setuptools broke editable mode

dvarrazzo opened this issue · 23 comments

setuptools version

65.2

Python version

3.9.13

OS

Ubuntu

Additional environment information

No response

Description

Since some recent releases (likely 64, when PEP-660 was implemented), installing psycopg with -e results in a broken package, where psycopg is an empty namespace.

This is an example of broken run. This is a similar run whose difference is installing psycopg without the -e flag, in order to test a depending package.

Expected behavior

See above

How to Reproduce

A more complete reproduction with stable git reference can be found in tox-dev/tox#2479.

Output

See above

Thank you very much @dvarrazzo for reporting this issue.

Could you please try to create a minimal/standalone reproducer (that does not involve using tox or pytest)?

I tried to do the following:

sudo apt update -y
sudo apt install -y libpq5 libpq-dev
cd /tmp && rm -rf /tmp/psycopg /tmp/empty_dir
git clone https://github.com/psycopg/psycopg.git /tmp/psycopg
cd /tmp/psycopg
python3.8 -m venv .venv
.venv/bin/python -m pip install -U pip
.venv/bin/python -m pip install -e ./psycopg
.venv/bin/python -m pip install -e ./psycopg_c
mkdir /tmp/empty_dir
cd /tmp/empty_dir  # <-- To avoid any problems with automatic injection of CWD into `sys.path`
cat <<EOS | /tmp/psycopg/.venv/bin/python -
import psycopg, psycopg_c
print(psycopg.Cursor)
print(psycopg.AsyncCursor)
print(dir(psycopg_c))
EOS
# Output ==>
# <class 'psycopg.Cursor'>
# <class 'psycopg.AsyncCursor'>
# ['__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__path__', '__spec__', '__version__', '_psycopg', 'pq', 'sys', 'version']

And I can see that the module members mentioned in the tox issue seem to exist.

Is there any chance this happens because Python automatically adds the current working dir to sys.path? (This way, the import system could mistake "/tmp/psycopg/psycopg" as the root of the pyscopg package). I tried to run the following example to investigate this theory:

# Continuation of the previous script
# still inside /tmp/psycopg
mkdir psycopg
echo "raise SystemError('kaboom!')" > psycopg/__init__.py
/tmp/psycopg/.venv/bin/python -c 'import psycopg'
# Output ==>
# Traceback (most recent call last):
#  File "<stdin>", line 1, in <module>
#  File "/tmp/empty_dir/psycopg/__init__.py", line 1, in <module>
#    raise SystemError('kaboom!')
# SystemError: kaboom!

I thought that, and in my investigation with tox I tried to chdir away, but it didn't fix the problem.

Anyway, does it mean that it is now mandatory to adopt a "src" style project organization?

Anyway, does it mean that it is now mandatory to adopt a "src" style project organization?

I don't think so...

In terms of "plain Python" usage", I would say things should not change a lot. The flat-layout should have the same flaws as before, right?
Specifically, if you have a folder under CWD with a name coinciding with the installed package, this folder might be imported (but that was already a well-known risk ...).

In terms of pytest things may change. I know that they try some magic to compensate the flat-layout flaws... I have not received any report about that yet (maybe this error is the first?).


The limitations of the implementation are documented here. So far the biggest impact is for old-style namespace packages using pkgutil and pkg_resources, but some workarounds are suggested in the docs (e.g., the strict mode should work).

I thought that, and in my investigation with tox I tried to chdir away, but it didn't fix the problem.

So do you confirm that the package itself works on an editable install, but things start to get weird when a 3rd tool is added to the mix? (Maybe tox, maybe pytest?)

Could it be the case one of these tools relies on a specific assumption about the editable installation? (I know that some static analysis tools will try to read a .egg-link file, which is gone in setuptools>=64...)

I can reproduce the problem with:

git clone git@github.com:psycopg/psycopg.git psycopg-test
cd psycopg-test
git checkout a7d39bf740d0d97dbe230d07177463668327431a

python3 -m venv .venv
source .venv/bin/activate
pip install "pip==22.2.2" "setuptools==65.2.0" "wheel==0.37.1"
pip install -e ./psycopg
python -c "import psycopg; psycopg.ProgrammingError" # fails
python -c "import psycopg; print(psycopg)"
# <module 'psycopg' (namespace)>

changing cwd works. Using pip install psycopg works too:

pip install ./psycopg
python -c "import psycopg; print(psycopg)"
# <module 'psycopg' from '/home/piro/dev/psycopg-test/.venv/lib/python3.8/site-packages/psycopg/__init__.py'>

Using a rather older set of libraries, editable mode works as expected:

pip install -U "setuptools==44.0.0" "pip==20.0.2" "wheel==0.37.1"
pip install -e ./psycopg
python -c "import psycopg; psycopg.ProgrammingError" # works
python -c "import psycopg; print(psycopg)"
# <module 'psycopg' from '/home/piro/dev/psycopg-test/psycopg/psycopg/__init__.py'>

Tested on my laptop using Python 3.8.10 (Ubuntu 20.04 package)

changing cwd works. Using pip install psycopg works too:

Thank you very much @dvarrazzo. So it is confirmed that it is happening because of the existence of the package CWD...

I can see that a regular installation does not have the same problem. If I had to guess I would say that Python starts by creating a namespace package, but then when it finds the other folder (the actual package) on sys.path and realises it contains a __init__.py file, it "regrets" the decision of having a namespace package and "collapses" it into a regular module.

This behaviour does not seem to be very well defined/coherent though...
In theory a folder in CWD should get priority during the import. We can do a series of experiments showing that if we don't mix namespace packages and traditional packages (the ones with __init__.py). I need to seek for some advice regarding this behaviour.

For the time being let's consider these "accidental namespace packages" as a limitation of the editable mode.

Ok, but what does it mean? Is this a regression that you will address (why would a random directory in cwd be more meaningful that an explicit entry in the pythonpath?) Or is this going to stay broken?

Should we pin a dependency on setuptools < 64 for development? Should we rename our code directory and make all git history operations more painful in order to restore editable mode?

Please let us know.

Ok, but what does it mean? Is this a regression that you will address (why would a random directory in cwd be more meaningful that an explicit entry in the pythonpath?) Or is this going to stay broken?

I am seeking for advice about this matter, before taking any final decision.

Why would a random directory in cwd be more meaningful that an explicit entry in the pythonpath?

This is how Python operates, right? Directories in CWD are meant to take precedence over other things in the site-packages. We can do the following test:

rm -rf /tmp/development
mkdir -p /tmp/development/pkgA/pkgA
cd /tmp/development

touch pkgA/pyproject.toml
echo "x = 42" > pkgA/pkgA/a.py

# --- Simulate regular installation ---
python3.8 -m venv .venv
cp -r pkgA/pkgA .venv/lib/python3.8/site-packages/
tree .venv/lib/python3.8/site-packages/pkgA
# .venv/lib/python3.8/site-packages/pkgA
# └── a.py

# --- Experiment ---
rm -f /tmp/workdir
mkdir -p /tmp/workdir/pkgA
echo "raise ValueError('kaboom')" > /tmp/workdir/pkgA/a.py
cd /tmp/workdir
/tmp/development/.venv/bin/python -c 'from pkgA import a; print(a.x)'   # <------ error
# ValueError: kaboom

If we add a __init__.py file to both /tmp/development/pkgA/pkgA and /tmp/workdir/pkgA, the results would be the same...
So when both packages contain (or not) __init__.py, CWD always take precedence. The only weird case is when one of them contains __init__.py and the other does not...

Should we pin a dependency on setuptools < 64 for development? Should we rename our code directory and make all git history operations more painful in order to restore editable mode?

You don't need to rename your directory. Right now, you can try pip install -e . --editable-mode strict or pip install -e . --editable-mode compat, maybe that will work for you?

This is how Python operates, right? Directories in CWD are meant to take precedence over other things in the site-packages.

I don't think so, no. This is not how Python used to operate until a couple of weeks ago.

Giving precedence to directories in cwd seems a design error and a security problem. A program would change its behaviour according to the cwd and an attacker could convince someone to run a program from their directory to hijack it. It is the same reason why cwd is not in PATH: it's considered insecure since MS-DOS time.

You don't need to rename your directory. Right now, you can try pip install -e . --editable-mode strict or pip install -e . --editable-mode compat, maybe that will work for you?

I don't see such a thing:

(venv-new) piro@baloo:~/dev/psycopg3$ pip --version
pip 22.2.2 from /home/piro/dev/psycopg3/tmp/venv-new/lib/python3.8/site-packages/pip (python 3.8)
(venv-new) piro@baloo:~/dev/psycopg3$ pip install -e ./psycopg --editable-mode strict

Usage:   

  ...

no such option: --editable-mode

Sorry for that. I never remember it by heart :/

These are the options, copied straight from the docs:

pip install -e . --config-settings editable_mode=strict

pip install -e . --config-settings editable_mode=compat

Both these options work. The documentation says about strict: "The exact details of how this mode is implemented may vary"; about compat: "The compat mode is transitional and will be removed in future versions of setuptools, it exists only to help during the migration period".

So it seems that default is broken and these options are not reliable solutions.

Hi @dvarrazzo , thank you very much for testing these solutions and confirming that they work.

The implementation details for both strict mode and the default editable installation may vary. They should be equally reliable.

.pth files, import hooks, file links... Those are all different techniques that can be used under the hood. But having a public commitment on using specifically one of them brings no value to setuptools.

Indeed, the opposite is quite true: if we don't commit to a particular mechanism, as Python and the ecosystem evolves, setuptools can select a different approach for the implementation, and potentially surpass existing limitations.

This should be no different from any functionality implemented in a Python project: the public API is the contract and the exact implementation details may vary. It does not make these functionality more or less reliable.


If you consider that "the default solution is broken", than you also have to consider that the legacy behaviour (python setup.py develop) is broken and that the behaviour in other backends are broken... They are all broken, in different ways.

Editable installs are complicated and all the existing implementation mechanisms in the ecosystem have severe limitations.

If you go around asking people what are their requirements for editable installs, you are going to realise that is technically impossible to meet all the requirements at the same time. You have to select a subset.

Right now setuptools offers two options for the user to select: the default + strict. They are equally reliable and they both satisfy just a subset of the requirements. But togheter they should cover the entire list.

@abravalheri I have reported a regression. Things were working previously and now they don't. I appreciate the hard work behind the hard problem of packaging, don't get me wrong. Regressions happen.

It is not true that a local directory used to have, or even has, precedence over installed packages:

  • this wasn't the case in the past
  • this is not the case with packages installed without -e.

Since a certain version, the default behaviour of editable packages changed and now presents a noticeable difference w.r.t. normally-installed packages, and present difficulties developing certain repositories layout (a monorepo where the directory name matches the package name, for instance - like psycopg 3 repository).

So, this is a regression. My questions are:

  • is this regression wanted? As in, is the current behaviour the wanted behaviour? If so please document it (but with a WARNING, not with obscure options which need a pass-through pip command to activate, hence don't even appear in pip --help) It would seem exceedingly bizarre to me. I invite you to look at the psycopg readme: we used to suggest to hack on the project using -e, whereas now we should explain setuptools internals.

  • is this regression unexpected? It sounds so to me, because other modes work differently (non-editable install, strict-editable, compat-editable). If so, is it expected to be fixed and can it become part of the spec/tests?

Thank you

I don't think so, no. This is not how Python used to operate until a couple of weeks ago.

Maybe I oversimplified things, used the wrong words, and accidentally made things more confusing, sorry for that.

There are a few ways that you can run Python and having something automatically inserted as the first entry to sys.path:

  • python path/to/script.py -- Python will automatically insert dirname('path/to/script.py') as the first entry to sys.path (referece).
  • python -c 'script' -- Python will automatically insert CWD as the first entry to sys.path (referece).
  • python -m module_name -- Python will automatically insert CWD as the first entry to sys.path (referece).

This is also documented in sys.path.

There are a few ways you can avoid that.

After a quick search I could not find in the documentation anything explicitly about the orders the import machinery traverse sys.path (maybe if I had looked harder I could find something). The behaviour we can see in practice is the one I reported in #3557 (comment).


Giving precedence to directories in cwd seems a design error and a security problem. A program would change its behaviour according to the cwd and an attacker could convince someone to run a program from their directory to hijack it. It is the same reason why cwd is not in PATH: it's considered insecure since MS-DOS time.

I think you are correct here. In fact with some social engineering this can be possible.
Let's take pip's example. From their docs the recommended way of running pip is the following:

python -m pip <pip arguments>

This makes the following attack possible:

rm -rf /tmp/workdir
mkdir -p /tmp/workdir/pip/
cd /tmp/workdir
touch pip/__init__.py
echo 'print("... doing malicious stuff...")' > pip/__main__.py
python -m pip --help
# ... doing malicious stuff...

I have reported a regression. Things were working previously and now they don't. I appreciate the hard work behind the hard problem of packaging, don't get me wrong. Regressions happen.

Thank you very much for the understanding and the patience. I am very sorry for the inconvenience.

This subject is a bit tricky... I don't think we can easily classify things in terms of "backwards" compatibility or "regression" because we are talking about the interaction between 2 different tools + a standard that came into play a few years ago.

In terms of implementation, the behaviour you used to experience in the past is the behaviour of python setup.py develop.
This behaviour was not changed. In terms of python setup.py develop what you get in setuptools v63 should be what you get in v64+.

However, motivated by the approval of PEP 660, pip decided to implement a different mechanism in terms of how it interacts with the build backend, in such a way that it is impossible to re-use the implementation of the develop command. While the old implementation was still usable until v63, we have been strongly encouraged to comply with the new spec, so pip could move things forward on their side.

For a while, there was discussion on what to do on the setuptools side and a lack of consensus (that I still believe exists). So I had to make a judment about how to implement PEP 660. The result that you observe today is what I think to be a compromise between arguments on the two fields1. It "fixes" some limitations of the previous approach, but in turn, it has its own limitations.

is this regression wanted? As in, is the current behaviour the wanted behaviour? If so please document it (but with a WARNING, not with obscure options which need a pass-through pip command to activate, hence don't even appear in pip --help) It would seem exceedingly bizarre to me.

Some aspects of the current implementation are wanted, and I would definetely not want to throw the baby out with the bathwater.
Unfortunatelly no one in the community has demonstrated a bullet-proof mechanism to implement editable installs in such a way that it mimics 100% the regular installation. Instead we are left in a "pick you poison" situation, and the mechanism we have for doing that is via the --config-settings editable_mode=....
Setuptools does currently "raises" a warning when running editable installs, and I can add more information to that. However pip by default hides all warnings.
You will only be able to see warnings if you run pip -v install -e ..

is this regression unexpected? It sounds so to me, because other modes work differently (non-editable install, strict-editable, compat-editable). If so, is it expected to be fixed and can it become part of the spec/tests?

It is expected that every single editable installation method will have flaws and that the users might have to switch between different editable modes to achieve different objectives. I plan to investigate different alternatives to see if we can overcome this limitation but this may take some time. If we overcome the limitation, it should definetely be added to the tests.

That is my personal plan, but of course, other setuptools maintainers might have different ideas...


@dvarrazzo, I am sorry that I cannot provide you definitive answers at this stage.
However, there are a few things that you can do right now to avoid hitting these limitations.
For example, you might try to use the changedir configuration in tox.
I also mentioned other scape hatches that can be used (e.g. the strict mode).

Finally, as long as pip allows it, you can use the environment variable SETUPTOOLS_ENABLE_FEATURES="legacy-editable" to fallback to access the previous implementation and be bug-by-bug compatible.

Footnotes

  1. I am only human so I am very aware that my judgement may be flawed. To compensate for it, I dragged the release of PEP 660 implementation in setuptools for a while and tried to gather feedback from the community as much as possible.

I am not able to install https://github.com/tensorflow/tfx-bsl in editable mode.

Setting SETUPTOOLS_ENABLE_FEATURES="legacy-editable" fixed the issue.

Unfortunatelly no one in the community has demonstrated a bullet-proof mechanism to implement editable installs in such a way that it mimics 100% the regular installation. Instead we are left in a "pick you poison" situation, and the mechanism we have for doing that is via the --config-settings editable_mode=....

Just to cross link it, there are some limitations in how config-settings operate in pip: pypa/pip#12310

What's at stake for me is the ability to use vscode/pylance with my editable repos:
microsoft/pylance-release#3473

Per recommendation above, I tried:

export SETUPTOOLS_ENABLE_FEATURES="legacy-editable"
pip install -e .

but I get the error:

ERROR: Project file:///home/a/b/c has a 'pyproject.toml' and its build backend is missing the 'build_editable' hook. Since it does not have a 'setup.py' nor a 'setup.cfg', it cannot be installed in editable mode. Consider using a build backend that supports PEP 660.

based on some quick searching, I'm not clear if there's a way to get SETUPTOOLS_ENABLE_FEATURES to work for me...

Hi @qci-amos, please note the following caveat for SETUPTOOLS_ENABLE_FEATURES:

Finally, as long as pip allows it, you can use the environment variable SETUPTOOLS_ENABLE_FEATURES="legacy-editable" to fallback to access the previous implementation and be bug-by-bug compatible.

Pip no longer allows backends to not implement PEP 660. So that is no longer an option.

What's at stake for me is the ability to use vscode/pylance with my editable repos

I think the right question here is to ask how vscode and pylance allow you to pass customised --config-settings to pip.

I think the right question here is to ask how vscode and pylance allow you to pass customised --config-settings to pip.

Well to be clear, what I don't currently have a solution for is how to use --config-settings in the case of pip install -r. I've been having that discussion here: pypa/pip#12310

shouldn't this just be an option:

mypackage = { path = "../mypackage/", develop = true, editable_mode=strict}

then poetry can pass that flag along when installing?

or even

mypackage = { path = "../mypackage/", develop = true, flags={editble_mode: strict} }

so we have full control over how packages are installed going forward as tooling changes without poetry needing to know about all possible flags

probably should even have a set_env={....} option for package-specific stuff too. with the explosion of build backends, theres a ton of package-specific environment stuff that can even vary between deps (especially in some of the gpu-specific settings)

shouldn't this just be an option:\n\nmypackage = { path = "../mypackage/", develop = true, editable_mode=strict}

Not really. This does not look like something setuptools would work with.

Setuptools is a build backed and does not directly work with dependencies. It also does not do any installation.

Yes, it has a config settings flag. Look at docs for the build command.

Even though you can pass a dict to setuptools via config_settings, the dict you are proposing doesn't make much sense to setuptools, in my opinion (support by build is the least of the concerns, to be honest).

Setuptools handle the build of a single distribution as a whole. It does not make sense to express editable installations piecewise, package by package.

There is already package-dir for expressing explicit paths to packages. Reworking it to path in the way you are suggesting would require major incompatible changes. Finally, what is the difference between develop and editable? These 2 settings seem redundant.

If you want to pass editable-mode=strict via config settings, you already can do that in the current version of setuptools.