jestjs/jest

Build a high-performance version of `resolve`

cpojer opened this issue ยท 23 comments

In Jest, we use jest-resolve to resolve modules. However, it shells out to two other dependencies, resolve and browser-resolve. resolve hasn't been maintained as much as I'd like, we have a hacky workaround inside of jest-resolve and we actually have knowledge of the entire filesystem (minus directories) in "HasteFS" (currently not passed to jest-resolve but that can be changed as the places it is created already passes a "ModuleMap" which is also coming from jest-haste-map).

The idea is to take the synchronous part of resolve and put it inside of jest-resolve. Then rewrite it and fine-tune it for performance (cc @trueadm) and use "HasteFS" to optimize the best case scenario (you can use jest-file-exists which is O(1) in the best case and fs.stat in the worst case).

The end state should be for Jest not to depend on resolve any longer. We do not need the asynchronous part of resolve.

Call to resolve: https://github.com/cpojer/jest/blob/master/packages/jest-resolve/src/index.js#L84
See: https://github.com/substack/node-resolve/blob/master/lib/sync.js

Follow-up: We should also aim at replacing browser-resolve and making it configurable so that our resolution algorithm can look at any field in package.json to re-route resolution to a module. This can be done in a follow-up, because this is already a big task :)
browser-resolve: https://github.com/defunctzombie/node-browser-resolve

cc @DmitriiAbramov @wanderley

@cpojer I made a follow up comment in my PR about part 2 of this issue. Waiting for your response to finish the implementation in the current state of things.

urish commented

Update (re #3597) - I added a super simple memoization to resolve/lib/sync.js, seems like it does the trick for me: the number of statSync() calls dropped by an order of magnitude

tvald commented

Since HasteFS isn't available to the public, are there any other plans to improve the performance of resolve?

HasteFS is open source and part of jest-haste-map, it's simply a simple virtual filesystem that contains a list of all files in a project.

@cpojer Is there a guide to setting this up?

It's all automatic, jest-haste-map is used throughout Jest. jest-resolve only uses ModuleMap but not HasteFS. We can update it to use HasteFS and fall through to fs if the file isn't found. This would optimize the average case, but not worst case, of resolving modules.

urish commented

Here is the workaround I use in my setup to speedup jest-resolve:

https://gist.github.com/urish/1a106ad6d1dbfcf77f5f744ae854fd6a

For me, this small change reduces the test suite run time from about 21 to 14 seconds when using workers, and from about 27 seconds to 16.5 seconds when using --runInBand (I have an Angular app with 250 tests)

tvald commented

I've given this an initial look over in tvald-contrib/jest@feature/haste-resolve.

I think the biggest hurdle is that hasteFS doesn't track directories, just files. Most of our time is spent recursively searching for node_modules. Without a cached directory listing, the replacement for resolve still have to test for the existence of that directory up a potentially deep hierarchy.

I'll continue working on this, however, as there are several obvious optimizations for resolve now that I'm familiar with jest's codebase.

@tvald woah, awesome! I'm happy to review pull requests. If your first change is simply to integrate all of resolve's code into jest-resolve, then we can take it from there. Generally, the smaller and more incremental your pull requests are, the more likely it is that we'll merge them quickly :)

I share the hasteFS concern. If watchman can return directories I'm not opposed to add directories to the hasteFS data structure.

tvald commented

Working on the initial PR over at #4315. Once that's in, I'll submit a tiny follow-up PR with memoization that improves resolve performance by orders of magnitude. HasteFS might not even be necessary, although it would certainly improve performance even further.

tvald commented

Memoized PR is at #4323.

Also, noting in this thread that @cpojer requested the following:

Thanks for the initial PR, this is great! In a follow-up, do you mind changing all the single-expression if-statements to wrap them with curly-braces? That's the code style we use in this repo :)
>> original comment

Also, could we add support for the "browser" field in a follow-up and remove "browser-resolve"? I would love it if we could drop that dependency :)
>> original comment

tvald commented

Moving discussion back from #4323:

...this will not work for things like Jest's watch mode because files may change over time of Jest's usage and the resolution algorithm should work with what is currently on the system, not what used to be there in the past.

To solve this, the Resolver class is able to be instantiated. We can merge the resolution logic into the base class and use a cache within there, and each test will receive it's own copy of the resolver, which will be recreated for individual test runs.

This will not share the resolution logic between two individual test files, so won't be as fast as it could get, but within the dependency graph of a single test, the performance optimization should still apply.

We can also change this to memoize for the duration of a single test run, and that would be fine with me, but there is no concept in the worker process that manages the lifetime of a full test run, so we'd have to introduce those hooks somewhere in the workers, which is hard because they are supposed to be stateless...

>> original comment @cpojer

tvald commented

There are some smaller optimizations which don't involve memoization that I'll open a PR for.

Note that Resolver.findNodeModule() is static and is directly called from jest-config in several places:

$ grep findNodeModule -R packages --exclude-dir=build --exclude-dir=__tests__ | cut -f1 -d: | uniq -c
   5 packages/jest-config/src/normalize.js
   3 packages/jest-config/src/utils.js
   3 packages/jest-resolve/src/index.js

Most resolution does seem to occur via an instance of Resolver, as described by @cpojer. However, I'm not familiar enough with Jest's codebase to see an easy path beyond this memoization within a single test.

Unfortunately, I don't think that alone will provide enough of a performance boost for Jest to be usable with our codebase.

tvald commented

New PR at #4325, with cleaned up code and the slight optimization of testing whether a directory exists before attempting to resolve (potentially 10+) paths within that directory.

tvald commented

After some profiling, it looks like memoization-per-test doesn't offer a significant improvement. I'll submit a PR anyways, since HasteFS support should be quite similar to memoization.

Performance-wise, we saw our full test suite drop from ~230s to ~220s (-10s / -4%).

tvald commented

Just noting that in light of #4109 our team is considering divestment from dependencies licensed under BSD+Patents, so I won't be contributing any further patches until that's settled. #4325 should at least have made jest usable with larger codebases.

By memoizing the default resolver, we're seeing a 3x speed boost in our tests. More details here: #4323 (comment)

@tvald you wouldn't by any chance be open to working more on this? ๐Ÿ˜€ Jest is MIT now

Looking at this now in 2020, I think we should reverse course and go back to using the resolve package.

  • it is excellently maintained these days
  • it has packageFilter which we can use to lookup browser instead of main field and drop the browser-resolve dependency
  • it has a pluggable readFile, isFile and isDirectory functions, so if we want to we can use HasteFS to implement its FS operations, falling back to normal fs if needed. We can also cache/memoize if we want
  • changes are happening to resolution logic in node (ESM, added options such as paths), and keeping up is painful
  • related to the above, ESM supports async resolution, which resolve comes with (albeit as callback, but easy enough to use util.promisify or some such)

Yeah that makes sense. The main intent of this issue was to build a really fast resolver, I don't care as much about the implementation details. Resolution is one of the hottest paths during a test run, so a 20% improvement for resolution may result in a 10-20% improvement in overall test runtime.

This issue is stale because it has been open for 1 year with no activity. Remove stale label or comment or this will be closed in 30 days.

In general, the entire jest-resolve module is quite messy (the added complexity of async resolution and ESM resolution (which it doesn't even do correctly: #9885)) and it needs a solid refactor. As part of that we should keep performance in mind.

Related, we've added caches for most file operations, so some optimization has landed already.

#8388
#8412
#8650
#9873
#11076
#11969

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
Please note this issue tracker is not a help forum. We recommend using StackOverflow or our discord channel for questions.