pypa/pip

pip wheel . does not follow symlinks whereas python setup.py bdist_wheel does

sbidoul opened this issue · 29 comments

When the package source code contains symlinks, python setup.py bdist_wheel follows them and store their content in the resulting wheel.

pip wheel . ignores them.

I was apparently using a wrong branch. Sorry for the noise.

After all I could reproduce. The directory structure must be as follow

  • linked
  • setup
  • setup/setup.py
  • setup/package -> ../linked

In the setup directory, when runnning setup.py bdist_wheel, the package is properly included, while running pip wheel . does not include it.

The root cause seems to be that bdist_wheel builds in place, while pip wheel copies the directory to another temporary place where it builds, and it does that without following the symlinks.

So my first question is why pip wheel does this copy when the source is a local directory?

This is going to effectively be as intended. We copy the directory so as not to modify the source directory and to contain all of our modifications to a throw away temporary directory.

@dstufft would it make sense to follow symlinks when copying?

@sbidoul Without thinking too hard about it I could see allowing that, but I'd need to make sure we weren't doing that for a reason.

@sbidoul that would present as massive security issue in many cases (symlink attacks are a pain)

@RonnyPfannschmidt I'm not sure that's true, since we're going to be executing a setup.py anyways after we copy it, I doubt there's anything you can do with symlinks you couldn't just do in the Python file itself.

@dstufft a paranoid possibility is using a symlink attack to turn a package that ships web served data into a package that ships web served data and some private details of the package builder, but building wheels in isolation should elevate that

@RonnyPfannschmidt @dstufft my point is that setup.py bdist_wheel does follow symlinks whereas pip wheel . does not. In my understanding the resulting wheel must be identical. So if there is any security implication (which I doubt) it is already present in setuptools.

@dstufft @xavfernandez does my last comment warrant reopening this?

I've just been burned by this symlink issue.

At the very least, wheel should provide a warning that it hasn't followed symlinks so the user knows what's wrong.

I think we can add a small documentation note to pip wheel as a short term workaround.

@pypa/pip-committers What do we want to do here? Following symlinks while copying seems fine to me, if that's what setuptools is doing.

setuptools doesnt make a copy of the source folder to begin with

sbidoul commented on 30 Mar 2017:
... would it make sense to follow symlinks when copying?

It would be sufficient to create a symlink in the copy pointing to the absolute path of the target of the symlink in the source which would reduce the attack risk IMO.

I think moving forward, pip is going to offload this work to PEP 517 backends and that should offload these tasks to the backends.

That said, I'll happily accept a PR that I some form improves this situation without breaking things.

Interestingly pip wheel -e . seems to work fine wrt this issue.

I now think this issue or variants of it arise in any situation where

  • setup.py / pyproject.toml is not at the root of the project
  • the build depends on resources outside of the setup.py / pyproject.toml subtree

I found three variants:

  1. build needs resources that are symlinks to files/directories outside of the subtree (the case which prompted this issue)
  2. build needs the git repository (eg when using setuptools_scm), and .git/ is in a parent directory not copied to the temp dir by pip
  3. build relies on the subdirectory name (somewhat exotic maybe, yet I have such as case where I want to create a custom build backend and part of the metadata depends on the subdirectory name)

Since pip copies the subtree to a temporary location before building, when running pip install . or pip wheel . resources that reside out of the subtree are not available for building.

I think this happens both for pep517 and legacy builds.

Note that when running pip install git+http://g.c/repo/project#egg=project&subdirectory=subdir it works fine because the whole project is available (I think no additional copy takes place in that case).

So while it could be possible to fix this particular issue (case 1.) with some strategy to preserve symlinks when copying to the temporary directory, this would not solve 2. nor 3.

I'm not sure if pep517 build isolation mandates working on a temporary copy of the source tree.

Maybe a possible approach could be to have a way for build backends to declare that they are safe for in-place builds (in an isolated build environment, but in the same source tree). Build backend would then have a way to promise to front ends they will not modify the source tree. And pip could use that information to decide to build in place without copying to a temporary directory.

Do you think it's a track worth investigating further?

To be honest, I think the best way of fixing this issue is to simply switch to building a sdist in place, and then extracting that sdist to the temporary directory. We've been talking about doing that for ages now, and I think we should bite the bullet and do it. The best approach is probably to do it for PEP 517 build only, so that issues like this won't get fixed for legacy builds, but conversely we will limit any potential issues to people using PEP 517, who are in a transition period anyway.

I've argued for build via sdist in the past, so I'm willing to work on it - with the proviso that my time is limited currently, so I can't promise it will go quickly...

PS The question of in-place builds was extensively debated during the PEP 517 discussions. I don't recall what conclusions we came to, but I do know we didn't come to a consensus (which is why PEP 517 doesn't address the issue). I'd strongly recommend researching that history before opening that question again (and I'd also caution that I'm personally not sure that this issue is significant enough to justify doing so... ☹️)

@pfmoore Thanks, I'll do some research to find those discussions about in place builds.

A quick question comes to mind, though: if builds go via sdist, does it mean building the sdist in place is considered "safer" than building the wheel in place? As an example, the setuptools build backend does modify the source tree by creating at least the .egg-info directory.

Also since this copy to a temp directory may influence the build outcome, it might be useful to clarify what frontends can and cannot do in that respect. For instance in this specific example, I just tested, and I have a situation where pip wheel . (in pep 517 mode) fails while python -m pep517.build -b . succeeds (because the latter builds in place). We may want to weed out such ambiguities while the spec is young enough?

I can't really judge the significance of the issue. My intuition is that it is relevant for all monorepo projects where, for instance, the python binding of the project is only one part of the system that lives in a a subdirectory.

if builds go via sdist, does it mean building the sdist in place is considered "safer" than building the wheel in place?

That's honestly a very good question, and not one anyone's really thought about, I suspect!

We may want to weed out such ambiguities while the spec is young enough?

Definitely, yes. The reason it isn't in PEP 517 yet is largely that everyone was burned out on the discussion, and we needed to get something out that defined the key points, even if it left certain things open to interpretation. Maybe this is the point at which we need another round of discussion on this.

Regarding significance, I agree - all I'm really going off is "it hasn't generated a lot of bug reports over the years". But that's not necessarily a good indicator, it's merely all we've got :-)

Maybe this is the point at which we need another round of discussion on this.

I can volunteer to open a thread about that on Discourse and try to summarize the matter.
I'll wait a little bit for my other PyPA topics (PEP and PRs) to land though, as I fear to have too many of them in flight at the same time :)

Some discussion about building via sdist is in #2195.

I wonder if an option to tell pip to build in place would make sense.

It would, yes.

Closed via #7882 (build in place).

We have now (per #7951) published a beta release of pip, pip 20.1b1. This release includes #7882, which implemented a solution for this issue.

I hope participants in this issue will help us by testing the beta and checking for new bugs. We'd like to identify and iron out any potential issues before the main 20.1 release on Tuesday.

I also welcome positive feedback along the lines of "yay, it works better now!" as well, since the issue tracker is usually full of "issues". :)

Unfortunately, there have been a number of issues with the implementation of in-place builds (which are being tracked under #7555) which means that for now, we need to revert #7882. As a result, this issue will become a problem again, and we'll therefore be reopening it. Longer-term, we hope to have a solution that addresses the issues that in-place builds solved, but without the impact on other workflows that the current solution had.

Sorry for the disruption that this will cause.

This is resolved now that pip builds in tree.