Huge memory usage regression: 580 MB in 0.43 → 6300 MB in 0.44

Question

Huge memory usage regression: 580 MB in 0.43 → 6300 MB in 0.44

andersk opened this issue 3 years ago · 28 comments

Using Node.js 14.17.5 with NPM 6.14.14 for this reproduction recipe (npx works slightly differently in NPM 7).

$ git clone -b v5.8.1 https://github.com/zulip/zulip-desktop.git
$ cd zulip-desktop
$ npm i
$ rm -rf node_modules/.cache
$ npx -p xo@0.43.0 time xo
…
  110 errors
Command exited with non-zero status 1
45.72user 0.89system 0:28.89elapsed 161%CPU (0avgtext+0avgdata 581408maxresident)k
0inputs+0outputs (0major+155120minor)pagefaults 0swaps
$ rm -rf node_modules/.cache
$ npx -p xo@0.44.0 time xo
…
FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory
…
Command terminated by signal 6
200.27user 7.35system 1:29.69elapsed 231%CPU (0avgtext+0avgdata 2274772maxresident)k
0inputs+0outputs (0major+717566minor)pagefaults 0swaps
$ rm -rf node_modules/.cache
$ NODE_OPTIONS=--max-old-space-size=8192 npx -p xo@0.44.0 time xo
…
  148 errors
Command exited with non-zero status 1
182.59user 5.34system 1:54.32elapsed 164%CPU (0avgtext+0avgdata 6284536maxresident)k
2808inputs+0outputs (2major+1631648minor)pagefaults 0swaps

I bisected this memory usage regression to e2e715d “Simplify lintFiles (#583)”. Cc @fisker.

Answer 1 · 2021-09-27T14:33:35.000Z

@fisker any chance you could have a look at this please? The slowness is quite noticeable. Thanks!

Answer 2 · 2021-09-27T15:24:46.000Z

I'm sorry for the regression.

Before, we search config files before lint, this cause problem here, it is also very slow when I want lint a single file, xo look for all config files(include package.json) , even very deep/unrelated config files.

Now we search config file one by one when actually linting. This solves the problem I described above. But I guess this is also the reason causes huge memory usage.

I'm not sure what to do.

Answer 3 · 2021-09-27T15:25:31.000Z

Maybe queue the lint jobs can solve it?

Answer 4 · 2021-09-29T17:01:59.000Z

I'm not familiar with the codebase to be able to help. Maybe @sindresorhus has a suggestion.

Answer 5 · 2021-10-05T04:21:07.000Z

I'm thinking issue is resolved in 0.45.0. I'm not seeing memory crashes.
But, I'm sure there's still ways to speed it up. Takes about a full minute to run yarn xo. Not sure if that's typical or not. Fairly typical react web app, 100% typescript.

Answer 6 · 2021-10-05T11:48:56.000Z

@devinrhode2 Not sure why you think that. There are no relevant commits between 0.44.0 and 0.45.0, and the problem still reproduces with the reproduction instructions from my original report.

Answer 7 · 2021-10-05T14:06:40.000Z

Hmm.. maybe with node 16.10.0 the crash disappears?

Answer 8 · 2021-10-05T18:34:59.000Z

I am noticing the crash with our new nx monorepo with node v16.10.0, but not with a smaller, CRA-based project.
Our NX monorepo has probably 100 tsconfig.json files, not sure if that's a factor though.

Answer 9 · 2021-10-06T00:40:48.000Z

Note that, like I alluded to in the report, npx in the NPM 7 distributed with Node.js 16 has a bug where it incorrectly prefers the installed version over the version explicitly provided on the command line (npm/cli#3210).

But, if you account for that (and don’t forget to delete node_modules/.cache), this reproduces equally with Node.js 16.

Answer 10 · 2021-10-08T04:52:29.000Z

So, super gross, but would checking in the node_modules/.cache into git work-around the issue in the short-term?

Answer 11 · 2021-10-08T04:53:19.000Z

of course you'd need to use v0.43.0 to populate the cache.

Another work-around could be running xo on just sub-sections of our code.

Answer 12 · 2021-10-08T16:12:32.000Z

Actually... would checking the ./node_modules/.cache generally be a good idea for shitty/already slow CI/CD pipelines?

Answer 13 · 2021-10-08T16:12:37.000Z

Another odd idea:
Intuitively, async/await gives us higher performance, because more things can be happening at the same time.

But, I suspect that it can lead to higher memory usage. V8 is going along, hits an await, holds that state in memory, and continues on looking for other work to do.

However, if there are some await calls that are on a critical code path, and there is no other meaningful work V8 could go work on, then possibly, changing those to synchronous read/write/exec calls could improve memory usage.
@fisker - thoughts?

Answer 14 · 2021-10-08T16:29:14.000Z

Yet another perspective. In theory this would be a feature. When I run XO, it sits there for at least 60 seconds, and then prints everything out. That must be quite a large amount of stuff to hold in memory. If instead, if printed results file by file, as it processed each file, it would be faster. I think this also better aligns the per-file performance with the whole-repo performance

Answer 15 · 2021-10-16T20:05:57.000Z

Latest xo consistently crashes with out-of-memory error (I have 8GB of RAM) for me too. Would it be possible to revert e2e715d until this issue is resolved?

Answer 16 · 2021-10-18T08:29:15.000Z

I tried limiting the concurrency and it didn't seem to help: 4f05195 So the issue might be somewhere else.

Answer 17 · 2021-10-18T08:29:41.000Z

I also tried upgrading to ESLint 8, which also didn't help.

Answer 18 · 2021-10-18T08:56:05.000Z

@fisker Using the dev tools memory inspector might reveal where the memory leak is.

Answer 19 · 2021-10-18T09:06:19.000Z

I took a quick snapshot in Chrome and it seems like most of the memory is spent on TypeScript. My guess is that since we're now creating a new instance of new ESLint() for each file, something regarding ESLint plugins are not cleaned up and we keep initiating new TypeScript parser instances (which are huge) without releasing them.

Answer 20 · 2021-10-18T14:47:42.000Z

We use GitHub Actions and after upgrading to 0.44 our CI started failing as the hosted runners were running out of memory (they have 7GB of RAM according to the docs)

For now we've downgraded and are sticking with 0.43 until this issue is resolved. This was on a 6 package monorepo FYI, and the individual packages aren't crazy large or anything.

Thought I'd mention it in case others are quietly hitting the same issue.

Answer 21 · 2021-10-18T21:59:52.000Z

Of course users could run XO on a per-file basis...

yarn xo --fix path/to/file.ts
yarn xo --fix path/to/file2.ts

We can churn through all files like this with a little scripting...

const fileList = await globby('**/*.{ts,tsx})

pMap(
  fileList,
  file => execa('yarn xo --fix ' + file),
  {concurrency: 5 }
)

Update: Completed script here: https://gist.github.com/devinrhode2/37727d116b723d9509cd7d7ff4726c64

This could be baked directly into XO as a "low-memory mode", in case people are dealing with CI's with very low amount of ram. (Who knows, maybe some sub-dependency has a memory regression, and then suddenly this issue pops up again).

A nice api might be like:

yarn xo --low-memory --concurrency=5 --fix

I think v0.45.0+this trick alone will be slower than v0.43.0, but it could be optimized more later.

Answer 22 · 2021-10-18T22:04:53.000Z

It may be more accurate to call this a "sequential" mode, since the idea here is to process 1 (or more) files at a time, output all the results (freeing associated memory), and then moving onto the next file... process that one entirely, output results, etc...

Answer 23 · 2021-10-20T04:50:16.000Z

I have completed a top-level wrapper script which runs xo on 1-to-many files at a time. https://gist.github.com/devinrhode2/37727d116b723d9509cd7d7ff4726c64
Based on how much free gb of ram you tell the script you have, it will guess how many files can be sent to xo at one time.

@sindresorhus I think this could be an ok short term solution, and we can improve this to be better medium term escape hatch people can use. We can't predict what may cause a memory regression - TS itself, some new type-aware rule, someone just has large files, etc. Of course we want to improve memory usage, and keep introducing optimizations, but this may help people in the short term.

Answer 24 · 2021-10-20T17:18:22.000Z

I believe there are many situations which can lead to running out of memory. I did some extremely brief searching for memory issues inside @typescript-eslint repo, and noticed these issues exist there too. The root problem, I'm guessing, exists inside of typescript, and only appears when @typescript-eslint is doing type-aware linting.

In fact, I am able to observe a memory issue with vanilla xo v0.43.0, and TS v4.4.4. I believe XO should specify precise versions of dependencies, the create-xo/npm init xo) script should remove all conflicting deps (eslint, eslint-config-prettier, etc...).

I feel we should all add notes on what TS version we are using when we are getting these memory issues, and XO should specify typescript as a peerDependency with an exact version.

I believe I've also seen memory issue with ts 4.4.2 and 4.4.3.

@sindresorhus - maybe using ts 4.3.5 will fix the issue for ow?

Answer 25 · 2021-10-20T20:54:38.000Z

@devinrhode2 If you take a look at my original test case, you’ll see that it’s using TypeScript 4.3.5. You’ll also see that I already did the work of bisecting the problem to a specific xo commit. It was not introduced by a TypeScript upgrade or your other suggestions. If you are seeing a problem related to those things, I suggest filing a separate issue with a detailed test case.

Answer 26 · 2021-10-21T22:07:08.000Z

@andersk - sorry I'm a bit all over the place.

I've been messing around with my "top-level wrapper" script. I've determined that the more files you send to XO at the same time, the faster the overall process is. Now, if you are linting 5 files, the speed boost may not be very meaningful, maybe you want the UX of printing file-by-file.

Of course, we can use NODE_OPTIONS="--max-old-space-size=131072" yarn xo --fix or node --max-old-space-size=131072 ./node_modules/xo/cli.js --fix to increase memory threshold.

Of course, ideally we never have memory issues. TS could probably make some changes tomorrow that legitimately/drastically reduce memory usage, it's also possible that never happens.

Anyway.

Here's an outline of an approach I think can work as for a top-level script is concerned:

We need to get some heuristic on the MB cost per file, or per line, whatever. Running xo file-by-file, I've seen processes taking about 350-400MB of ram. This is a bad measurement, we should instead send 10 files to XO, and compare that to sending 20 files, etc. Easy to rabbit hole here – we just want a heuristic.
Given this heuristic, we also need to somehow make some guesses as to how much available memory XO can use.
Perhaps if process.env.CI is "true", we assume something like 2GB, if false, we assume something like 4 or 8. Although, if we could actually measure, that would be amazing. This can also be explicitly set by the user.

Given we have 4 GB free, and our heuristic estimates each file to take up about 500MB, then we can send 8 files at a time to XO. Somehow, we need to set the --max-old-space-size flag for node, to something like $FREE_GB+($FREE_GB*0.2), (20% buffer, or more to be safe, 100% buffer is fine I believe)

Summary:

Develop a heuristic for determining how much memory will be consumed
Set --max-old-space-size to be above that estimate

Given a budget of 4GB of ram, and we think each file will use up about 500MB of ram, we then send 8 files at a time to XO.

It could also be the case, that we actually disregard typical amount of "free ram". If a device has 16GB of ram, just use it all up, which will cause the system to slow down, but may reduce the total time it takes to run XO.

Answer 27 · 2021-10-21T22:09:34.000Z

Yet another idea, maybe we can directly request GC to run: https://stackoverflow.com/questions/27321997/how-to-request-the-garbage-collector-in-node-js-to-run
We could clear GC after every call to eslint. Unless there are legit memory leaks, or there's a legit need for a huge amount of memory, this should also solve the problem.

Answer 28 · 2021-10-21T22:46:16.000Z

@devinrhode2 I’m glad you’re enthusiastic about finding generic ways to improve XO’s memory usage. However, my concern is that you’re making this report harder to follow by presenting generic ideas that aren’t related to the specific regression introduced by a specific XO commit that I reported here. Would you be willing to move your ideas to a new issue? Thanks.