Build a benchmark based on UV slow case
Opened this issue · 5 comments
Our production user is hitting slow cases in ways that are hard for us to optimize separately. We should have clear instructions for how to reproduce the slow cases they are hitting, and also similarly simulacrums of them that can be run with less infrastructure. cc #177
How to use uv, a python dependency resolver, as benchmark. The example we are using, bio_embeddings[all]
on python 3.12 is pathological case that has to try 100k versions. It
- Install python 3.12
git clone https://github.com/astral-sh/pubgrub
(using the default branch,main
)git clone https://github.com/astral-sh/uv
- Edit the top level Cargo.toml in uv to point to the pubgrub checkout
cargo build --profile profiling && mv target/profiling/uv main-uv
- Edit pubgrub, e.g. cherry-pick your commits. Remember that we have downstream patches, so
dev
will fail to build uv. cargo build --profile profiling && mv target/profiling/uv branch-uv
- Run the hyperfine command, e.g.
taskset -c 0 hyperfine --warmup 1 "./main-uv pip compile -p 3.12 --exclude-newer 2024-03-01 scripts/requirements/bio_embeddings.in" "./branch-uv pip compile -p 3.12 --exclude-newer 2024-03-01 scripts/requirements/bio_embeddings.in"
wherebio_embeddings.in
containsbio_embeddings[all]
. You may or may not want to use the taskset, for me it helps by avoiding the efficiency cores and reducing variance. Also don't forget to set your cpu to performance mode, this is very important for variance on my machine. - The original report for #135 can be reproduced with
scripts/requirements/boto3.in
. - As baseline i use
scripts/requirements/jupyter.in
, where nearly always the first version fits, so it shouldn't change. - You can use the profiling build with samply, e.g.
samply record --rate 5000 target/profiling/uv pip compile scripts/requirements/bio_embeddings.in
After a lot of work, I have been able to reproduce your set up. Here is a profile I was able to collect https://share.firefox.dev/43ioU30
by hacking in a OfflineDependencyProvider<P, VS>
, adding dependencies to it in add_incompatibility_from_dependencies
, and serializing it when done I was able to get a RON file. By various dirty tricks (calling to_string
on P
, manually fixing prereleases and other such complications) I converted into something that can be run in this repo without additional dependencies.
bio_embeddings_str_SemanticVersion.txt
Here is a profile of running it through our normal large case
benchmark runner. https://share.firefox.dev/495a8hG
Unfortunately it still looks quite different. I think one biggest difference is your incorporation of the RPITIT
branch. Another safe difference is that large case
attempts to resolve every version not just the root version. It therefore biases data toward the problems of small dependency trees which tend to be dominated by the process of reading in data. Alternatively it may have to do with the difference in representation or semantics between &str
vs PubGrubPackage
or SemanticVersion
vs Version
. Or it may have to do with not representing filtered out versions.
Definitely more investigation is required.
I think one biggest difference is your incorporation of the
RPITIT
branch. Another safe difference is that large case attempts to resolve every version not just the root version. It therefore biases data toward the problems of small dependency trees which tend to be dominated by the process of reading in data.
I ran the new benchmark file on the RPITIT
branch and I altered our large case
to only resolve if p.to_string() == "root"
. The resulting profile looks the same. So the process of elimination continues.
Alternatively it may have to do with the difference in representation or semantics between
&str
vsPubGrubPackage
orSemanticVersion
vsVersion
.
String
with Arc<SemanticVersion>
still looks the same. So time to switch tactics from modifying PubGrub to get the ron filed cooperate to looking at uv.
Or it may have to do with not representing filtered out versions.
Using lots of dirty hacks I was able to instrument add_incompatibility
as well as add_incompatibility_from_dependencies
. The resulting file also needed to be cleaned up by hand.
bio_embeddings_2_str_SemanticVersion.txt however the resulting profile still looks the same. https://share.firefox.dev/3Pso4LE
Alternatively it may have to do with the difference in representation or semantics between
&str
vsPubGrubPackage
orSemanticVersion
vsVersion
.
This time I took the output of my instrumented uv street without any hand fixes.
bio_embeddings_3_str_pep440.txt and changed large case
to use the pep440_rs
crate (with the appropriate suffix on file). The resulting profile is not fundamentally different. https://share.firefox.dev/49XrNsF
I'm seeing pick_highest_priority_package
is big in your profile. In uv, we pick priority by using the package we've seen first as first. When we visit root, we iterate over root's deps and give the first the highest priority, the second the second highest, etc. When we selected foo, we visit all deps of foo and also queue them to our priority mapping.
That does make pick_highest_priority_package
O(1)
, but it also pushes a lot of work into the guts of the resolver. So for example in this case with your FIFO .map(|priority| priority + 1)
the run time is 7.62s
. But if we picked FILO .map(|priority| !(priority + 1))
it takes 0.18s
. Without a consistent and logical ordering for package arrival these are equally arbitrary. I'm sure there are other cases where FIFO beats FILO. (The ordering is not entirely arbitrary, they are ordered by when they were discovered which correlates with when they were first pre-fetched. So FIFO is a good default for not waiting on the network.)
The OfflineDependencyProvider
uses "fewest number of versions matching the requirement". I talk about the history of this heuristic astral-sh/uv#1398 (comment)