pip install fails when PEP658 metadata is present if the user tries to install using a non-normalized package name
chriskuehl opened this issue · 11 comments
Description
When installing packages from a registry which supports PEP658, pip install
produces an error if the user tries to install a package using its non-normalized name.
For example, testing here with the fluffy-server
package using an underscore instead of a hyphen:
$ pip install -i http://localhost:7777/simple fluffy_server
Looking in indexes: http://localhost:7777/simple
Collecting fluffy_server
Obtaining dependency information for fluffy_server from http://localhost:7777/packages/fluffy_server-1.39.2-py3-none-any.whl.metadata
Downloading http://localhost:7777/packages/fluffy_server-1.39.2-py3-none-any.whl.metadata (727 bytes)
Discarding http://localhost:7777/packages/fluffy_server-1.39.2-py3-none-any.whl (from http://localhost:7777/simple/fluffy-server/): Requested fluffy_server from http://localhost:7777/packages/fluffy_server-1.39.2-py3-none-any.whl has inconsistent Name: expected 'fluffy_server', but metadata has 'fluffy-server'
ERROR: Could not find a version that satisfies the requirement fluffy_server (from versions: 1.39.2)
ERROR: No matching distribution found for fluffy_server
Package name variants like fluffy_server
, FLUFFY-SERVER
, etc. all fail. Only the normalized name fluffy-server
succeeds.
Expected behavior
pip should allow installing using the non-normalized package name as it will be very confusing for users otherwise.
pip version
Tested with 23.1.2 and current main
Python version
3.11
OS
Linux (debian bullseye)
How to Reproduce
I have created a minimal reproduction PEP503 static file registry here which you can easily test with: https://github.com/chriskuehl/pip-pep658-normalization-bug-repro
- Clone the repo
- Start a static file server at the repo root, e.g.
python3 -m http.server 7777
- Try to install the package with
pip install -i http://localhost:7777/simple fluffy_server
and observe the error from above.
This can also be reproduced using pypi.org
, however you need to use the PEP503 (HTML /simple/
files) rather than the new JSON format. I could not find a built-in option to ask pip to use the HTML index, but if you comment out these two lines and force pip to use text/html
, you can see that pip discards the wheel and installs from the sdist instead:
$ pip install fluffy_server
Collecting fluffy_server
Obtaining dependency information for fluffy_server from https://files.pythonhosted.org/packages/f0/38/4e40f319819da80719c4ab791250f2db752f55c5f0f9450d81a84a618922/fluffy_server-1.39.2-py3-none-any.whl.metadata
Downloading fluffy_server-1.39.2-py3-none-any.whl.metadata (727 bytes)
Discarding https://files.pythonhosted.org/packages/f0/38/4e40f319819da80719c4ab791250f2db752f55c5f0f9450d81a84a618922/fluffy_server-1.39.2-py3-none-any.whl#sha256=cfa5516c13c98b17898348c7f12c6953c01965c8fdec690b70b0fab4936949c5 (from https://pypi.org/simple/fluffy-server/) (requires-python:>=3.10): Requested fluffy_server from https://files.pythonhosted.org/packages/f0/38/4e40f319819da80719c4ab791250f2db752f55c5f0f9450d81a84a618922/fluffy_server-1.39.2-py3-none-any.whl#sha256=cfa5516c13c98b17898348c7f12c6953c01965c8fdec690b70b0fab4936949c5 has inconsistent Name: expected 'fluffy_server', but metadata has 'fluffy-server'
Downloading fluffy-server-1.39.2.tar.gz (97 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 97.3/97.3 kB 1.3 MB/s eta 0:00:00
Preparing metadata (setup.py) ... done
Code of Conduct
- I agree to follow the PSF Code of Conduct.
Hmm, I would have expected pip to always send the wheel url with .metadata
appended, which is what PEP 658 specifies, and which will work unless the server has inconsistent filenames. I can’t tell for certain (I’m on mobile right now) but I suspect PyPI isn’t following that rule (wheel file name normalisation uses underscores rather than hyphens, so maybe that’s the problem). If my suspicion is right, this will need reporting to the Warehouse project. And your local server will need fixing as well.
Hmm, I would have expected pip to always send the wheel url with
.metadata
appended, which is what PEP 658 specifies, and which will work unless the server has inconsistent filenames.
I believe it is. These are the requests I see:
127.0.0.1 - - [20/May/2023 03:16:21] "GET /simple/fluffy-server/ HTTP/1.1" 200 -
127.0.0.1 - - [20/May/2023 03:16:21] "GET /packages/fluffy_server-1.39.2-py3-none-any.whl.metadata HTTP/1.1" 200 -
It makes these same requests no matter how I enter the package name, so I believe it is requesting the correct files.
I think the problem is that when it downloads and parses the metadata, it compares the name parsed from the metadata file (which is normalized) directly to the name the user entered without normalizing it.
Perhaps a change like this:
--- src/pip/_internal/operations/prepare.py 2023-05-20 03:23:56.269263381 -0500
+++ src/pip/_internal/operations/prepare.py 2023-05-20 03:23:52.665267870 -0500
@@ -410,7 +410,7 @@
# NB: raw_name will fall back to the name from the install requirement if
# the Name: field is not present, but it's noted in the raw_name docstring
# that that should NEVER happen anyway.
- if metadata_dist.raw_name != req.req.name:
+ if metadata_dist.canonical_name != canonicalize_name(req.req.name):
raise MetadataInconsistent(
req, "Name", req.req.name, metadata_dist.raw_name
)
Seems to fix it for me but I'm not sure if there are any edge cases.
The fact that the comment above the line suggests that raw_name
was chosen specifically (and mentions that it falls back to canonical_name
which they are trying to avoid for some reason) makes me think there may be additional concerns I'm not aware of.
Ah yes - thanks for the analysis! I'll take a closer look at the code, but IMO your suggested fix looks right, so if there are any additional concerns, I'd want to fix them another way (or maybe just not worry about them unless someone reports them in a real-world situation
I tracked down a comment about name verification from the PR that introduced this code which has some context: #11111 (comment)
If I'm understanding correctly, I think they didn't want to use metadata_dist.canonical_name
directly since this isn't populated by parsing the METADATA
file, but instead passed in when constructing the object:
https://github.com/pypa/pip/pull/11111/files#diff-dc6d0ea33ced90d3436f4edb226bbccebdddf1bdf8a0c91b2cc178546198769eR113-R128
Since the goal is to verify the METADATA
file, it makes a lot of sense to not use it and instead want to use raw_name
which does come from the METADATA
file.
So perhaps it makes sense to still use raw_name
but canonicalize it?
--- src/pip/_internal/operations/prepare.py 2023-05-20 03:23:56.269263381 -0500
+++ src/pip/_internal/operations/prepare.py 2023-05-20 03:23:52.665267870 -0500
@@ -410,7 +410,7 @@
# NB: raw_name will fall back to the name from the install requirement if
# the Name: field is not present, but it's noted in the raw_name docstring
# that that should NEVER happen anyway.
- if metadata_dist.raw_name != req.req.name:
+ if canonicalize_name(metadata_dist.raw_name) != canonicalize_name(req.req.name):
raise MetadataInconsistent(
req, "Name", req.req.name, metadata_dist.raw_name
)
(Not sure if it's necessary to canonicalize the raw_name or not. In my case it is already canonicalized in the metadata but maybe some wheels have it not canonicalized for whatever reason?)
So perhaps it makes sense to still use raw_name but canonicalize it?
Yes, that's precisely the conclusion I'd just come to from looking at the code, so what you found on the tracker makes perfect sense to me. I think it's right to canonicalize explicitly, as the spec doesn't require the name in the metadata file to be normalised, and actually recommends normalising before comparison.
Would you be interested in making this fix into a PR? What you posted above plus a short changelog would be sufficient - it would be nice to add a test if you can work out how to trigger the behaviour from a test, but if not, I'd be happy to look into that. Alternatively, I'll make a PR, but I don't want to take the credit if you'd like to do it
One thought - why isn't this reproducible if you use the JSON index from PyPI? That should work the same. If it's not failing in the same way, that suggests the JSON index isn't hitting the PEP 658 fastpath in the first place (because the files will be the same either way).
Sounds good, I'll try to get a PR up later today. Will also check what's happening with the JSON index, I haven't looked into it all other than noticing that it doesn't seem to be downloading the metadata files in that mode.
Cool - it would be a shame if the new JSON form of the index is disabling the new separate metadata file
The reason this doesn't happen with the JSON index is because pip is looking for a dist-info-metadata
key while pypi.org provides a data-dist-info-metadata
key:
$ curl -H 'Accept: application/vnd.pypi.simple.v1+json' https://pypi.org/simple/fluffy-server/ | jq .files
[...]
{
"data-dist-info-metadata": {
"sha256": "4db99543165cbdeef42ccb6257545911ccd7865d65e304e3e056f383a25f309c"
},
"filename": "fluffy_server-1.39.2-py3-none-any.whl",
"hashes": {
"sha256": "cfa5516c13c98b17898348c7f12c6953c01965c8fdec690b70b0fab4936949c5"
},
"requires-python": ">=3.10",
"size": 119321,
"upload-time": "2023-05-14T04:36:58.299962Z",
"url": "https://files.pythonhosted.org/packages/f0/38/4e40f319819da80719c4ab791250f2db752f55c5f0f9450d81a84a618922/fluffy_server-1.39.2-py3-none-any.whl",
"yanked": false
}
[...]
I think pypi.org is providing the wrong key name as it should not be prefixed with data-
according to PEP691: https://peps.python.org/pep-0691/#project-detail
If I temporarily patch pip to use the (incorrect) key name for the PyPI index, I get a traceback raised:
Traceback (most recent call last):
File "/home/ckuehl/proj/pip/src/pip/_internal/cli/base_command.py", line 169, in exc_logging_wrapper
status = run_func(*args)
^^^^^^^^^^^^^^^
File "/home/ckuehl/proj/pip/src/pip/_internal/cli/req_command.py", line 248, in wrapper
return func(self, options, args)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ckuehl/proj/pip/src/pip/_internal/commands/install.py", line 377, in run
requirement_set = resolver.resolve(
^^^^^^^^^^^^^^^^^
File "/home/ckuehl/proj/pip/src/pip/_internal/resolution/resolvelib/resolver.py", line 92, in resolve
result = self._result = resolver.resolve(
^^^^^^^^^^^^^^^^^
File "/home/ckuehl/proj/pip/src/pip/_vendor/resolvelib/resolvers.py", line 546, in resolve
state = resolution.resolve(requirements, max_rounds=max_rounds)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ckuehl/proj/pip/src/pip/_vendor/resolvelib/resolvers.py", line 397, in resolve
self._add_to_criteria(self.state.criteria, r, parent=None)
File "/home/ckuehl/proj/pip/src/pip/_vendor/resolvelib/resolvers.py", line 173, in _add_to_criteria
if not criterion.candidates:
File "/home/ckuehl/proj/pip/src/pip/_vendor/resolvelib/structs.py", line 156, in __bool__
return bool(self._sequence)
^^^^^^^^^^^^^^^^^^^^
File "/home/ckuehl/proj/pip/src/pip/_internal/resolution/resolvelib/found_candidates.py", line 155, in __bool__
return any(self)
^^^^^^^^^
File "/home/ckuehl/proj/pip/src/pip/_internal/resolution/resolvelib/found_candidates.py", line 143, in <genexpr>
return (c for c in iterator if id(c) not in self._incompatible_ids)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ckuehl/proj/pip/src/pip/_internal/resolution/resolvelib/found_candidates.py", line 47, in _iter_built
candidate = func()
^^^^^^
File "/home/ckuehl/proj/pip/src/pip/_internal/resolution/resolvelib/factory.py", line 206, in _make_candidate_from_link
self._link_candidate_cache[link] = LinkCandidate(
^^^^^^^^^^^^^^
File "/home/ckuehl/proj/pip/src/pip/_internal/resolution/resolvelib/candidates.py", line 293, in __init__
super().__init__(
File "/home/ckuehl/proj/pip/src/pip/_internal/resolution/resolvelib/candidates.py", line 156, in __init__
self.dist = self._prepare()
^^^^^^^^^^^^^^^
File "/home/ckuehl/proj/pip/src/pip/_internal/resolution/resolvelib/candidates.py", line 225, in _prepare
dist = self._prepare_distribution()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ckuehl/proj/pip/src/pip/_internal/resolution/resolvelib/candidates.py", line 304, in _prepare_distribution
return preparer.prepare_linked_requirement(self._ireq, parallel_builds=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ckuehl/proj/pip/src/pip/_internal/operations/prepare.py", line 510, in prepare_linked_requirement
metadata_dist = self._fetch_metadata_only(req)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ckuehl/proj/pip/src/pip/_internal/operations/prepare.py", line 374, in _fetch_metadata_only
return self._fetch_metadata_using_link_data_attr(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ckuehl/proj/pip/src/pip/_internal/operations/prepare.py", line 384, in _fetch_metadata_using_link_data_attr
metadata_link = req.link.metadata_link()
^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ckuehl/proj/pip/src/pip/_internal/models/link.py", line 421, in metadata_link
metadata_link_hash = LinkHash.parse_pep658_hash(self.dist_info_metadata)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ckuehl/proj/pip/src/pip/_internal/models/link.py", line 79, in parse_pep658_hash
name, sep, value = dist_info_metadata.partition("=")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'dict' object has no attribute 'partition'
I think there is a bug on both sides:
- PyPI is using the wrong key name
- pip is expecting the value for that key to be a string (it tries to apply the same parsing that it does for PEP503 html indexes), but PEP691 indicates that it should be a JSON dict of hashes instead of a string).
If I further patch pip to parse the dict correctly, it reproduces the original error from this issue when using the JSON index:
$ pip install --no-cache --no-deps --only-binary :all: fluffy_server==1.39.2
Collecting fluffy_server==1.39.2
Obtaining dependency information for fluffy_server==1.39.2 from https://files.pythonhosted.org/packages/f0/38/4e40f319819da80719c4ab791250f2db752f55c5f0f9450d81a84a618922/fluffy_server-1.39.2-py3-none-any.whl.metadata
Downloading fluffy_server-1.39.2-py3-none-any.whl.metadata (727 bytes)
Discarding https://files.pythonhosted.org/packages/f0/38/4e40f319819da80719c4ab791250f2db752f55c5f0f9450d81a84a618922/fluffy_server-1.39.2-py3-none-any.whl (from https://pypi.org/simple/fluffy-server/) (requires-python:>=3.10): Requested fluffy_server==1.39.2 from https://files.pythonhosted.org/packages/f0/38/4e40f319819da80719c4ab791250f2db752f55c5f0f9450d81a84a618922/fluffy_server-1.39.2-py3-none-any.whl has inconsistent Name: expected 'fluffy_server', but metadata has 'fluffy-server'
ERROR: Could not find a version that satisfies the requirement fluffy_server==1.39.2 (from versions: 1.0.0, 1.1.0, 1.2.0, 1.2.1, 1.3.0, 1.3.1, 1.3.2, 1.4.0, 1.5.0, 1.6.0, 1.6.1, 1.6.2, 1.6.3, 1.6.4, 1.6.5, 1.7.0, 1.7.1, 1.8.0, 1.8.1, 1.9.0, 1.9.1, 1.10.0, 1.11.0, 1.11.1, 1.12.0, 1.13.0, 1.14.0, 1.15.0, 1.15.1, 1.15.2, 1.15.3, 1.15.4, 1.16.0, 1.16.1, 1.17.0, 1.18.0, 1.19.0, 1.19.1, 1.19.2, 1.20.0, 1.20.1, 1.20.2, 1.20.3, 1.21.0, 1.22.0, 1.23.0, 1.23.1, 1.24.0, 1.24.1, 1.24.2, 1.25.0, 1.26.0, 1.27.0, 1.28.0, 1.29.0, 1.29.1, 1.30.0, 1.31.0, 1.32.0, 1.33.0, 1.33.1, 1.33.2, 1.33.3, 1.33.4, 1.34.0, 1.35.0, 1.36.0, 1.36.1, 1.37.0, 1.38.0, 1.39.0, 1.39.1, 1.39.2)
ERROR: No matching distribution found for fluffy_server==1.39.2
I will go file separate tickets for the pip and warehouse issues discovered. I wonder if it will be hard for warehouse to fix the key name since I think it would immediately break a lot of existing pip versions if they did due to the parsing bug.
Here are the 2 bugs:
I will still try to create a PR for this issue now.