Build a high-performance version of `resolve`
cpojer opened this issue ยท 23 comments
In Jest, we use jest-resolve
to resolve modules. However, it shells out to two other dependencies, resolve
and browser-resolve
. resolve
hasn't been maintained as much as I'd like, we have a hacky workaround inside of jest-resolve
and we actually have knowledge of the entire filesystem (minus directories) in "HasteFS" (currently not passed to jest-resolve but that can be changed as the places it is created already passes a "ModuleMap" which is also coming from jest-haste-map
).
The idea is to take the synchronous part of resolve
and put it inside of jest-resolve
. Then rewrite it and fine-tune it for performance (cc @trueadm) and use "HasteFS" to optimize the best case scenario (you can use jest-file-exists
which is O(1) in the best case and fs.stat in the worst case).
The end state should be for Jest not to depend on resolve
any longer. We do not need the asynchronous part of resolve.
Call to resolve: https://github.com/cpojer/jest/blob/master/packages/jest-resolve/src/index.js#L84
See: https://github.com/substack/node-resolve/blob/master/lib/sync.js
Follow-up: We should also aim at replacing browser-resolve and making it configurable so that our resolution algorithm can look at any field in package.json to re-route resolution to a module. This can be done in a follow-up, because this is already a big task :)
browser-resolve: https://github.com/defunctzombie/node-browser-resolve
Update (re #3597) - I added a super simple memoization to resolve/lib/sync.js
, seems like it does the trick for me: the number of statSync()
calls dropped by an order of magnitude
Since HasteFS isn't available to the public, are there any other plans to improve the performance of resolve
?
HasteFS is open source and part of jest-haste-map, it's simply a simple virtual filesystem that contains a list of all files in a project.
It's all automatic, jest-haste-map is used throughout Jest. jest-resolve
only uses ModuleMap
but not HasteFS. We can update it to use HasteFS and fall through to fs
if the file isn't found. This would optimize the average case, but not worst case, of resolving modules.
Here is the workaround I use in my setup to speedup jest-resolve
:
https://gist.github.com/urish/1a106ad6d1dbfcf77f5f744ae854fd6a
For me, this small change reduces the test suite run time from about 21 to 14 seconds when using workers, and from about 27 seconds to 16.5 seconds when using --runInBand
(I have an Angular app with 250 tests)
I've given this an initial look over in tvald-contrib/jest@feature/haste-resolve.
I think the biggest hurdle is that hasteFS
doesn't track directories, just files. Most of our time is spent recursively searching for node_modules
. Without a cached directory listing, the replacement for resolve
still have to test for the existence of that directory up a potentially deep hierarchy.
I'll continue working on this, however, as there are several obvious optimizations for resolve
now that I'm familiar with jest
's codebase.
@tvald woah, awesome! I'm happy to review pull requests. If your first change is simply to integrate all of resolve's code into jest-resolve, then we can take it from there. Generally, the smaller and more incremental your pull requests are, the more likely it is that we'll merge them quickly :)
I share the hasteFS concern. If watchman can return directories I'm not opposed to add directories to the hasteFS data structure.
Working on the initial PR over at #4315. Once that's in, I'll submit a tiny follow-up PR with memoization that improves resolve performance by orders of magnitude. HasteFS might not even be necessary, although it would certainly improve performance even further.
Memoized PR is at #4323.
Also, noting in this thread that @cpojer requested the following:
Thanks for the initial PR, this is great! In a follow-up, do you mind changing all the single-expression if-statements to wrap them with curly-braces? That's the code style we use in this repo :)
>> original comment
Also, could we add support for the "browser" field in a follow-up and remove "browser-resolve"? I would love it if we could drop that dependency :)
>> original comment
Moving discussion back from #4323:
...this will not work for things like Jest's watch mode because files may change over time of Jest's usage and the resolution algorithm should work with what is currently on the system, not what used to be there in the past.
To solve this, the Resolver class is able to be instantiated. We can merge the resolution logic into the base class and use a cache within there, and each test will receive it's own copy of the resolver, which will be recreated for individual test runs.
This will not share the resolution logic between two individual test files, so won't be as fast as it could get, but within the dependency graph of a single test, the performance optimization should still apply.
We can also change this to memoize for the duration of a single test run, and that would be fine with me, but there is no concept in the worker process that manages the lifetime of a full test run, so we'd have to introduce those hooks somewhere in the workers, which is hard because they are supposed to be stateless...
There are some smaller optimizations which don't involve memoization that I'll open a PR for.
Note that Resolver.findNodeModule()
is static and is directly called from jest-config
in several places:
$ grep findNodeModule -R packages --exclude-dir=build --exclude-dir=__tests__ | cut -f1 -d: | uniq -c
5 packages/jest-config/src/normalize.js
3 packages/jest-config/src/utils.js
3 packages/jest-resolve/src/index.js
Most resolution does seem to occur via an instance of Resolver
, as described by @cpojer. However, I'm not familiar enough with Jest's codebase to see an easy path beyond this memoization within a single test.
Unfortunately, I don't think that alone will provide enough of a performance boost for Jest to be usable with our codebase.
New PR at #4325, with cleaned up code and the slight optimization of testing whether a directory exists before attempting to resolve (potentially 10+) paths within that directory.
After some profiling, it looks like memoization-per-test doesn't offer a significant improvement. I'll submit a PR anyways, since HasteFS support should be quite similar to memoization.
Performance-wise, we saw our full test suite drop from ~230s to ~220s (-10s / -4%).
By memoizing the default resolver, we're seeing a 3x speed boost in our tests. More details here: #4323 (comment)
@tvald you wouldn't by any chance be open to working more on this? ๐ Jest is MIT now
Looking at this now in 2020, I think we should reverse course and go back to using the resolve
package.
- it is excellently maintained these days
- it has
packageFilter
which we can use to lookupbrowser
instead ofmain
field and drop thebrowser-resolve
dependency - it has a pluggable
readFile
,isFile
andisDirectory
functions, so if we want to we can use HasteFS to implement its FS operations, falling back to normalfs
if needed. We can also cache/memoize if we want - changes are happening to resolution logic in node (ESM, added options such as
paths
), and keeping up is painful - related to the above, ESM supports async resolution, which
resolve
comes with (albeit as callback, but easy enough to useutil.promisify
or some such)
Yeah that makes sense. The main intent of this issue was to build a really fast resolver, I don't care as much about the implementation details. Resolution is one of the hottest paths during a test run, so a 20% improvement for resolution may result in a 10-20% improvement in overall test runtime.
This issue is stale because it has been open for 1 year with no activity. Remove stale label or comment or this will be closed in 30 days.
In general, the entire jest-resolve
module is quite messy (the added complexity of async resolution and ESM resolution (which it doesn't even do correctly: #9885)) and it needs a solid refactor. As part of that we should keep performance in mind.
Related, we've added caches for most file operations, so some optimization has landed already.
This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
Please note this issue tracker is not a help forum. We recommend using StackOverflow or our discord channel for questions.