Better balanced shards
A5rocks opened this issue · 1 comments
I would love if mypy primer better balanced sharding. On a recent PR to mypy, I noticed that:
- shard 0 only took 7 minutes
- shard 1 took 27 minutes
- shard 2 took 14 minutes
- shard 3 took 20 minutes
- shard 4 took 21 minutes
(Note that mypy-primer could take ~10 minutes less if it optimally balanced)
I know that it would be infeasible to construct lists for every single combination, so I propose:
What if every project had a "difficulty" number that was a rough estimate of time mypy takes to type check it? The idea is that you could try to balance these numbers into buckets (just use a greedy approach: from largest difficulty to smallest just always put it in the lowest difficulty bucket).
I'm not sure how we could keep these up to date though. Is there a metric that is simple to take but that correlates with mypy runtime? Number of files? Dependencies? Lines of code? Count of import typing
?
Yeah, agreed that this would be nice.
The distribution is quite head heavy (cough sympy, pandas, graphql cough), so I think you could get most of the benefit by just adding a manual score to the longest ones. mypy_primer --measure-project-runtimes --concurrency 1
should show project runtimes.
An amusing fact: at one point I noticed there was a particularly bad sharding, so my quick fix was:
Line 60 in 236dab3
A random musing: I've been curious about this but haven't looked at it yet is investigating how the mypyc-compiled mypy to pure-python mypy speed differs across projects