error UNHANDLED EXCEPTION write EPIPE while running on Netlify
wahidshafique opened this issue · 103 comments
Preliminary Checks
- This issue is not a duplicate. Before opening a new issue, please search existing issues: https://github.com/gatsbyjs/gatsby/issues
- This issue is not a question, feature request, RFC, or anything other than a bug report directly related to Gatsby. Please post those things in GitHub Discussions: https://github.com/gatsbyjs/gatsby/discussions
Description
When trying to run Gatsby latest 4.9.1
I get this error
4:35:23 PM: success write out requires - 0.005s
4:36:27 PM: success Building production JavaScript and CSS bundles - 63.377s
4:37:03 PM: success Building HTML renderer - 36.060s
4:37:03 PM: success Execute page configs - 0.380s
4:37:03 PM: success Caching Webpack compilations - 0.001s
4:37:03 PM: error (node:2227) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 end listeners added to [PassThrough]. Use emitter.setMaxListeners() to increase limit
4:37:03 PM: (Use `node --trace-warnings ...` to show where the warning was created)
4:37:07 PM: success run queries in workers - 3.599s - 37/37 10.28/s
4:37:09 PM: success Running gatsby-plugin-sharp.IMAGE_PROCESSING jobs - 120.528s - 5/5 0.04/s
4:37:09 PM: error UNHANDLED EXCEPTION write EPIPE
4:37:09 PM:
4:37:09 PM:
4:37:09 PM: Error: write EPIPE
4:37:09 PM:
4:37:09 PM: - child_process:864 ChildProcess.target._send
4:37:09 PM: node:internal/child_process:864:20
4:37:09 PM:
4:37:09 PM: - child_process:737 ChildProcess.target.send
4:37:09 PM: node:internal/child_process:737:19
4:37:09 PM:
4:37:09 PM: - index.js:291 WorkerPool.sendMessage
4:37:09 PM: [repo]/[gatsby-worker]/dist/index.js:291:19
4:37:09 PM:
4:37:09 PM: - worker-messaging.ts:22
4:37:09 PM: [repo]/[gatsby]/src/utils/jobs/worker-messaging.ts:22:22
4:37:09 PM:
4:37:09 PM:
4:37:09 PM: not finished Merge worker state - 0.058s
4:37:09 PM: error Command failed with exit code 1.
4:37:09 PM: info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this command.
4:37:09 PM:
4:37:09 PM: ────────────────────────────────────────────────────────────────
4:37:09 PM: "build.command" failed
4:37:09 PM: ────────────────────────────────────────────────────────────────
4:37:09 PM:
4:37:09 PM: Error message
4:37:09 PM: Command failed with exit code 1: yarn postinstall && yarn build:incremental
4:37:09 PM:
4:37:09 PM: Error location
4:37:09 PM: In build.command from netlify.toml:
4:37:09 PM: yarn postinstall && yarn build:incremental
4:37:09 PM:
I cleared my Netlify cache before running this and this seems like a new issue as per #33738 (comment)
Reproduction Link
tbd
Steps to Reproduce
Following steps outlined here: #33738
Expected Result
For the site to build as it normally did before I updated packages.
Actual Result
Consistently fails
Environment
System:
OS: macOS 12.0.1
CPU: (12) x64 Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
Shell: 5.8 - /bin/zsh
Binaries:
Node: 17.4.0 - /var/folders/tb/qb7x5sw53vngt06y9g92csn80000gp/T/yarn--1646432031346-0.9788514151471666/node
Yarn: 1.22.17 - /var/folders/tb/qb7x5sw53vngt06y9g92csn80000gp/T/yarn--1646432031346-0.9788514151471666/yarn
npm: 8.3.1 - ~/.nvm/versions/node/v17.4.0/bin/npm
Languages:
Python: 2.7.18 - /usr/bin/python
Browsers:
Chrome: 99.0.4844.51
Safari: 15.1
Config Flags
GATSBY_EXPERIMENTAL_PAGE_BUILD_ON_DATA_CHANGES=true
Links/References
https://answers.netlify.com/t/error-unhandled-exception-write-epipe/52650
https://answers.netlify.com/t/gatsby-v4-works-locally-but-timed-out-on-netlify/46339/2
https://answers.netlify.com/t/gatsby-v4-works-locally-but-timed-out-on-netlify/46339/21
I can confirm that I also have this issue with v4.9.1
and I previously experienced it with v4.8.0
Update: The current solution that appears to work for most people is to pin Gatsby to v4.7.2
yarn add gatsby@4.7.2
or npm install gatsby@4.7.2
or you could make this change in your package.json file:
"gatsby": "4.7.2",
You can keep all your Gatsby plugins on the latest version (4.11.1).
It has been breaking for me on v4.9.0
all week... I got it to build once after setting the ENVs suggested in this issue, but it is now breaking again 😶🌫️
I'm facing the same problem after upgrade plugins to latest version. After some tries, I gave up finally and I'm using Github Actions instead to deploy to Netlify now.
I am getting the same issue with v4.9.1
Hi!
As with the older issue, you'll need to provide a minimal reproduction for us so that we can help with this further.
Hi!
As with the older issue, you'll need to provide a minimal reproduction for us so that we can help with this further.
I'm not sure how you can provide a minimal reproduction in this case because I think this issue is possibly related to memory and it happens specifically with Netlify. I have not experienced this issue locally or on Gatsby Cloud.
Also, I don't experience this error with every build, sometimes it builds and many times it doesn't.
I just know that this started with v4.8.0
My fix for this, as advised by Netlify, was to pin to "gatsby": "4.7"
. I agree with @t2ca about reproduction constraints here. I don't have any capacity at the moment, but I was thinking that the Netlify build image could be used. I believe @benlavalley already tried this in some capacity (and Ben if you're reading this, it'd be awesome if you have the repro available somewhere). If I recall, the 3Gb memory constraint was the thing that was causing failed builds, and Netlify has fluctuating allocation for the runners, so you could get north of that guarantee and see no issues (hence the intermittent nature of this problem)
Hi!
As with the older issue, you'll need to provide a minimal reproduction for us so that we can help with this further.I'm not sure how you can provide a minimal reproduction in this case because I think this issue is possibly related to memory and it happens specifically with Netlify. I have not experienced this issue locally or on Gatsby Cloud.
Also, I don't experience this error with every build, sometimes it builds and many times it doesn't.
I just know that this started with
v4.8.0
Agreed; not sure how to reproduce - it works locally and even in my preview build on Netlify. I also rolled back to v4.7.0 for the time being.
I also have the same issue on a relatively small blog type site I administer for a friend that uses NetlifyCMS, Gatsby and builds and deploys on Netlify.
Currently with v4.6.2
it builds fine both locally and on Netlify.
Updating to v4.9.2
it builds fine locally but the Deploy Preview results in the same UNHANDLED EXCEPTION write EPIPE error.
1:01:19 PM: success write out requires - 0.004s
1:02:15 PM: success Building production JavaScript and CSS bundles - 55.644s
1:02:46 PM: success Building HTML renderer - 31.623s
1:02:46 PM: success Execute page configs - 0.036s
1:02:46 PM: success Caching Webpack compilations - 0.000s
1:02:51 PM: success run queries in workers - 4.233s - 33/33 7.80/s
1:03:04 PM: success Running gatsby-plugin-sharp.IMAGE_PROCESSING jobs - 107.420s - 60/60 0.56/s
1:03:04 PM: error UNHANDLED EXCEPTION write EPIPE
1:03:04 PM:
1:03:04 PM:
1:03:04 PM: Error: write EPIPE
1:03:04 PM:
1:03:04 PM: - child_process:866 ChildProcess.target._send
1:03:04 PM: node:internal/child_process:866:20
1:03:04 PM:
1:03:04 PM: - child_process:739 ChildProcess.target.send
1:03:04 PM: node:internal/child_process:739:19
1:03:04 PM:
1:03:04 PM: - index.js:291 WorkerPool.sendMessage
1:03:04 PM: [repo]/[gatsby-worker]/dist/index.js:291:19
1:03:04 PM:
1:03:04 PM: - worker-messaging.ts:22
1:03:04 PM: [repo]/[gatsby]/src/utils/jobs/worker-messaging.ts:22:22
1:03:04 PM:
1:03:04 PM:
1:03:04 PM: not finished Merge worker state - 0.052s
1:03:05 PM:
1:03:05 PM: ────────────────────────────────────────────────────────────────
1:03:05 PM: "build.command" failed
1:03:05 PM: ────────────────────────────────────────────────────────────────
1:03:05 PM:
1:03:05 PM: Error message
1:03:05 PM: Command failed with exit code 1: gatsby build
1:03:05 PM:
1:03:05 PM: Error location
1:03:05 PM: In Build command from Netlify app:
1:03:05 PM: gatsby build
Here's the output from gatsby info
locally
System:
OS: macOS 12.2.1
CPU: (4) x64 Intel(R) Core(TM) i7-5557U CPU @ 3.10GHz
Shell: 5.8 - /bin/zsh
Binaries:
Node: 16.14.0 - /usr/local/bin/node
npm: 8.5.3 - /usr/local/bin/npm
Languages:
Python: 2.7.18 - /usr/bin/python
Browsers:
Chrome: 99.0.4844.51
Safari: 15.3
npmPackages:
gatsby: ^4.9.2 => 4.9.2
gatsby-plugin-catch-links: ^4.9.0 => 4.9.0
gatsby-plugin-feed-generator: ^2.0.5 => 2.0.5
gatsby-plugin-google-fonts: ^1.0.1 => 1.0.1
gatsby-plugin-google-gtag: ^4.9.0 => 4.9.0
gatsby-plugin-image: ^2.9.0 => 2.9.0
gatsby-plugin-manifest: ^4.9.0 => 4.9.0
gatsby-plugin-netlify: ^4.1.0 => 4.1.0
gatsby-plugin-netlify-cms: ^6.9.0 => 6.9.0
gatsby-plugin-offline: ^5.9.0 => 5.9.0
gatsby-plugin-react-helmet: ^5.9.0 => 5.9.0
gatsby-plugin-sass: ^5.9.0 => 5.9.0
gatsby-plugin-sharp: ^4.9.0 => 4.9.0
gatsby-plugin-sitemap: ^5.9.0 => 5.9.0
gatsby-remark-embed-video: ^3.1.1 => 3.1.1
gatsby-remark-external-links: 0.0.4 => 0.0.4
gatsby-remark-images: ^6.9.0 => 6.9.0
gatsby-remark-relative-images: ^2.0.2 => 2.0.2
gatsby-remark-responsive-iframe: ^5.9.0 => 5.9.0
gatsby-source-filesystem: ^4.9.0 => 4.9.0
gatsby-transformer-remark: ^5.9.0 => 5.9.0
gatsby-transformer-sharp: ^4.9.0 => 4.9.0
npmGlobalPackages:
gatsby-cli: 4.9.0
I have tried some needed maintenance but with no success:
- Cleared cache and retried
- Replacing deprecated Netlify plugin
Gatsby cache
withEssential Gatsby
- Optimised source images (only a couple of images were largish anyway)
- Updated Netlify build image from Xenial to Focal
Will stick with v4.6.2
for now, but it is another instance of v4.9.x
consistently failing to build on Netlify.
Disabling webp image generation, as @netlify support apparently recommended, did not resolve the build errors, while pinning gatsby 4.7.2 did.
Disabling webp image generation, as @netlify support apparently recommended, did not resolve the build errors, while pinning gatsby 4.7.2 did.
Thanks for the update. I also tried some of the suggestions like reducing the size of the images and nothing worked.
I think that the solution for now is to pin Gatsby to v4.7
Update: I've seen this issue on and off for a few weeks, so no promises if this is a surefire fix, but I add the following environment variable to force the process to reduce memory usage and it seems to help:
NODE_OPTIONS = --max-old-space-size= 4096
I wish that someone on their support team would have told me this or suggested another workaround. Our build times have also been fluctuating wildly so not sure if there are other variables at play.
Hmm, adding this didn't work for me. But does it show up in the build logs what NODE_OPTIONS are set? I didn't see any good docs for setting that in netlify...
sharp
v0.30.1 which is upgraded from v0.29.3 in 326a483 requires libvips
v8.12.2. 1
Netlify's build environment comes with libvips
v8.9.1. 2
Maybe upgrading to sharp v0.30.1 (326a483) brought an incompatibility?
Update: This seems to be not right. Overriding the resolution of sharp to v0.29.3 did not solve the problem.
Footnotes
I got the issue again, even when using Gatsby v4.7.0. I had updated gatsby-plugin-sharp to v4.9.1. Rolling gatsby-plugin-sharp back to v4.9.0 solved the issue.
10:51:55 AM: success Running gatsby-plugin-sharp.IMAGE_PROCESSING jobs - 188.098s - 155/155 0.82/s
10:51:55 AM: error UNHANDLED EXCEPTION write EPIPE
10:51:55 AM:
10:51:55 AM:
10:51:55 AM: Error: write EPIPE
10:51:55 AM:
10:51:55 AM: - child_process.js:841 ChildProcess.target._send
10:51:55 AM: internal/child_process.js:841:20
10:51:55 AM:
10:51:55 AM: - child_process.js:712 ChildProcess.target.send
10:51:55 AM: internal/child_process.js:712:19
10:51:55 AM:
10:51:55 AM: - index.js:291 WorkerPool.sendMessage
10:51:55 AM: [repo]/[gatsby-worker]/dist/index.js:291:19
10:51:55 AM:
10:51:55 AM: - worker-messaging.ts:22
10:51:55 AM: [repo]/[gatsby]/src/utils/jobs/worker-messaging.ts:22:22
10:51:55 AM:
10:51:55 AM:
10:51:55 AM: not finished Merge worker state - 0.046s
10:51:55 AM: npm ERR! code ELIFECYCLE
10:51:55 AM: npm ERR! errno 1
10:51:55 AM: npm ERR! portfolio@ build:gatsby build
10:51:55 AM: npm ERR! Exit status 1
10:51:55 AM: npm ERR!
10:51:55 AM: npm ERR! Failed at the portfolio@ build script.
10:51:55 AM: npm ERR! This is probably not a problem with npm. There is likely additional logging output above.
Hey there. I've been having a similar issue for the past few days. I'm still a webdev noob and I'm having trouble figuring out how to roll back to specific versions of gatsby and other plugins. Can you tell me how to do this?
Hey there. I've been having a similar issue for the past few days. I'm still a webdev noob and I'm having trouble figuring out how to roll back to specific versions of gatsby and other plugins. Can you tell me how to do this?
All you have to do is yarn add gatsby@4.7.2
or npm install gatsby@4.7.2
or you could make this change in your package.json
"gatsby": "4.7.2",
You can keep all your plugins on the latest version (4.9.3).
gotcha. thank you!
To add to potential solutions coming in from Netlify, there is talk of lowering GATSBY_CONCURRENT_DOWNLOAD
to something like 16
.
Is there anyone who is willing to invite me to their github repository and netlify account by any chance? Contact me at ward@gatsbyjs.com to have a look
I'm having the same issue but even though I use Netlify
to host my pages I use GitHub Actions to build and deploy everything from there to Netlify
. I'm using the Netlify CLI to do that within my actions: npx netlify deploy --build --message "Deploy from GitHub Actions" --prod
(or npx netlify build --offline
to test if the build is working in general) but the same issue happens in my GitHub actions as well. So the whole Netlify build infrastructure is not involved at all here.
Setting GATSBY_CONCURRENT_DOWNLOAD
to 15 and GATSBY_CPU_COUNT
to 1 did not help. Also downgrading Gatsby to 4.7.x
did not really because with that I'm getting TypeScript compile errors for node_modules/gatsby-plugin-utils/dist/has-feature.d.ts
because of the transient dependency gatsby-plugin-utils
which was also updated and breaks now.
The reason why I just seeing this issue today and not earlier is that I switched my hero image from <StaticImage />
to <GatsbyImage />
which should be the same regarding image processing, right? 🤔
@wardpeet my blog is open source and suffering from this issue (in case that's enough): https://github.com/browniebroke/browniebroke.com
@wardpeet my blog is also open source should you need access. Just let me know.
Update: I've seen this issue on and off for a few weeks, so no promises if this is a surefire fix, but I add the following environment variable to force the process to reduce memory usage and it seems to help:
NODE_OPTIONS = --max-old-space-size= 4096
I wish that someone on their support team would have told me this or suggested another workaround. Our build times have also been fluctuating wildly so not sure if there are other variables at play.
Thanks!
I have been experiencing the same issue on Gatsby v4.9.x.
I downgraded to v4.1.3 (per another thread) and added ENV
variables, but no luck.
I have been experiencing this issue as well while migrating a site from gatsby 2 to 4. Builds work locally but fail 9 times out of 10 on netlify.
4:35:42 PM: success run queries in workers - 56.748s - 256/256 4.51/s
4:36:26 PM: success Running gatsby-plugin-sharp.IMAGE_PROCESSING jobs - 100.587s - 198/198 1.97/s
4:36:26 PM: error UNHANDLED EXCEPTION write EPIPE
4:36:26 PM:
4:36:26 PM:
4:36:26 PM: Error: write EPIPE
4:36:26 PM:
4:36:26 PM: - child_process:864 ChildProcess.target._send
4:36:26 PM: node:internal/child_process:864:20
4:36:26 PM:
4:36:26 PM: - child_process:737 ChildProcess.target.send
4:36:26 PM: node:internal/child_process:737:19
4:36:26 PM:
4:36:26 PM: - index.js:298 WorkerPool.sendMessage
4:36:26 PM: [repo]/[gatsby-worker]/dist/index.js:298:19
4:36:26 PM:
4:36:26 PM: - worker-messaging.ts:22
4:36:26 PM: [repo]/[gatsby]/src/utils/jobs/worker-messaging.ts:22:22
4:36:26 PM:
4:36:26 PM:
4:36:26 PM: not finished Merge worker state - 0.204s
I have tried pinning gatsby to 4.1.3, 4.6.2, 4.7.2, and 4.9.3, as suggested in the various threads dealing with this issue, all without success.
I have also tried various values for env variables GATSBY_CONCURRENT_DOWNLOAD, GATSBY_CPU_COUNT,
and NODE_OPTIONS = --max-old-space-size= 4096 without success.
I have upgraded my netlify build image and profiled my local memory usage to be sure memory use was not spiking over 3GB and have significantly decreased bundle size by optimizing images stored in the repo rather than in a CMS.
I have also experimented with rolling back versions of gatsby-plugin-image, gastby-plugin-sharp, gatsby-transformer-sharp, and sharp under the theory that this issue has something to do with processing static images (similar to benlavalley's experience here #33738 (comment)) (I maintain another site with very few static images and it has been on gastby 4.4. with no issue for months).
The options at this point seem to be rolling back to Gatsby 3, rearchitecting the site to host all images in a CMS, find a replacement for Netlify.
find a replacement for Netlify
The pessimist in me thinks this is a feature, not a bug.
I have been experiencing this issue as well while migrating a site from gatsby 2 to 4. Builds work locally but fail 9 times out of 10 on netlify.
4:35:42 PM: success run queries in workers - 56.748s - 256/256 4.51/s 4:36:26 PM: success Running gatsby-plugin-sharp.IMAGE_PROCESSING jobs - 100.587s - 198/198 1.97/s 4:36:26 PM: error UNHANDLED EXCEPTION write EPIPE 4:36:26 PM: 4:36:26 PM: 4:36:26 PM: Error: write EPIPE 4:36:26 PM: 4:36:26 PM: - child_process:864 ChildProcess.target._send 4:36:26 PM: node:internal/child_process:864:20 4:36:26 PM: 4:36:26 PM: - child_process:737 ChildProcess.target.send 4:36:26 PM: node:internal/child_process:737:19 4:36:26 PM: 4:36:26 PM: - index.js:298 WorkerPool.sendMessage 4:36:26 PM: [repo]/[gatsby-worker]/dist/index.js:298:19 4:36:26 PM: 4:36:26 PM: - worker-messaging.ts:22 4:36:26 PM: [repo]/[gatsby]/src/utils/jobs/worker-messaging.ts:22:22 4:36:26 PM: 4:36:26 PM: 4:36:26 PM: not finished Merge worker state - 0.204s
I have tried pinning gatsby to 4.1.3, 4.6.2, 4.7.2, and 4.9.3, as suggested in the various threads dealing with this issue, all without success.
I have also tried various values for env variables GATSBY_CONCURRENT_DOWNLOAD, GATSBY_CPU_COUNT, and NODE_OPTIONS = --max-old-space-size= 4096 without success.
I have upgraded my netlify build image and profiled my local memory usage to be sure memory use was not spiking over 3GB and have significantly decreased bundle size by optimizing images stored in the repo rather than in a CMS.
I have also experimented with rolling back versions of gatsby-plugin-image, gastby-plugin-sharp, gatsby-transformer-sharp, and sharp under the theory that this issue has something to do with processing static images (similar to benlavalley's experience here #33738 (comment)) (I maintain another site with very few static images and it has been on gastby 4.4. with no issue for months).
The options at this point seem to be rolling back to Gatsby 3, rearchitecting the site to host all images in a CMS, find a replacement for Netlify.
Depending on how much static content you have, I found one thing I can do is reduce it significantly which allows my site to build and deploy. I then check it all back in, and because of what I suspect to be build caching, my site is then able to deploy.
It's not just about images it seems, but the overall amount of content Gatby is processing.
Because Netlify states they aren't capable of changing memory limits, and Gatsby V4 has been out in the wild now for quite sometime and it doesn't seem like it will be reducing it's own memory usage, I'm probably going to have to bail on Netlify as well :(
Edit: Couple of builds later the problem is back.
Original answer:
I had the same issue. I've just done two things simultaneously:
- set environment variable NODE_OPTIONS =
--max-old-space-size=4096
(without space, as space separates different Node options - maybe that's somehow relevant in the answers above?) - updated Essential Gatsby plugin (@netlify/plugin-gatsby) on Netlify from v1 to v2 - that's probably not relevant, because earlier I already had the warning that my plugin is outdated, but the build was fine.
This steps fixed the issue for me, at least for current build...
I made a simple github action to solve this problem. If anyone interested, check it out here.
https://github.com/thundermiracle/netlify-deploy
I've already switched all my GatsbyJS from Netlify's CD to this one.
Just chiming in to say that I just started getting this too after upgrading and losing my build cache. Trying a few things to fix it. I can confirm that it is building fine on Gatsby Cloud.
cc @ascorbic - have you seen this issue? Wondering if you have any thoughts given your background and work on the Gatsby and Netlify integration.
Just chiming in to say that I just started getting this too after upgrading and losing my build cache. Trying a few things to fix it. I can confirm that it is building fine on Gatsby Cloud.
cc @ascorbic - have you seen this issue? Wondering if you have any thoughts given your background and work on the Gatsby and Netlify integration.
Hey @ehowey, I have had this issue for over a month now and I've pretty much tried everything that was suggested.
The only solution that worked for me is to pin Gatsby to v4.7.2
It's still possible to get a successful build with the latest version of Gatsby but it's not consistent.
My understanding is that the issue has to do with memory and the fact that Netlify has a 3Gb limit.
You shouldn't experience this issue locally or on Gatsby Cloud.
I've spent some time investigating this (thanks @browniebroke for the great repro) and have found out a few things. First, I don't think it's memory: I can reproduce it every time on the browniebroke site using Netlify's enterprise High Performance builds, which have 32GB available, where I have allocated 20GB to node. It fails at the same point, whether using regular builds or high perf. It fails on the same image each time, but removing that image means it fails on another image, so I'm not sure if that's the issue. The error can be prevented from aborting the build by adding an error-handler as the second argument here: https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby-worker/src/index.ts#L406
I don't know if this is ok. The message being sent is JOB_COMPLETED
, which fails because the worker has closed the channel. When I test it, the generated site seems fine. If this is acceptable, I'm happy to open a PR. What do you think, @wardpeet ?
I would certainly like to find out what happened after 4.7.2
that caused this. I don't know if the error is in gatsby-plugin-sharp
, gatsby-worker
or something else.
One possible clue is that on my M1 Mac I have very occasionally managed to have it fail around the same place, and this time is has a different error. I don't know if it's connected, but it is interesting that it happens in a similar place:
MDB_PROBLEM: Unexpected problem - txn should abort
Error: MDB_PROBLEM: Unexpected problem - txn should abort
- open.js:156
[browniebroke.com]/[lmdb]/open.js:156:21
- write.js:761 LMDBStore.transactionSync
[browniebroke.com]/[lmdb]/write.js:761:17
- open.js:155 new LMDBStore
[browniebroke.com]/[lmdb]/open.js:155:11
- open.js:218 LMDBStore.openDB
[browniebroke.com]/[lmdb]/open.js:218:6
- cache-lmdb.ts:60 GatsbyCacheLmdb.getDb
[browniebroke.com]/[gatsby]/src/utils/cache-lmdb.ts:60:44
- cache-lmdb.ts:69 GatsbyCacheLmdb.get
[browniebroke.com]/[gatsby]/src/utils/cache-lmdb.ts:69:17
- index.js:382 cachifiedProcess
[browniebroke.com]/[gatsby-plugin-sharp]/index.js:382:30
- index.js:396 base64
[browniebroke.com]/[gatsby-plugin-sharp]/index.js:396:18
- index.js:573 fluid
[browniebroke.com]/[gatsby-plugin-sharp]/index.js:573:25
ERROR #85928
An error occurred during parallel query running.
Go here for troubleshooting tips: https://gatsby.dev/pqr-feedback
Error: Worker exited before finishing task
- index.js:117 ChildProcess.<anonymous>
[browniebroke.com]/[gatsby-worker]/dist/index.js:117:45
- node:events:390 ChildProcess.emit
node:events:390:28
- child_process:290 Process.ChildProcess._handle.onexit
node:internal/child_process:290:12
The relevant change for the MDB_PROBLEM
could be that we updated LMDB from v1 to v2: #34576
The lmdb version is no longer locked to 2.2.1, it's ^2
now
Thanks for the investigation!
Thanks @LekoArts ! I've tried using yarn resolutions to pin lmdb to 2.2.1 and it works! Do you know what could be causing this regression?
Ugh. Scrap that. No, it still fails.
The error can be prevented from aborting the build by adding an error-handler as the second argument here: https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby-worker/src/index.ts#L406
I don't know if this is ok. The message being sent is JOB_COMPLETED, which fails because the worker has closed the channel. When I test it, the generated site seems fine. If this is acceptable, I'm happy to open a PR.
Reason we send JOB_COMPLETED
back to workers is that something might await
on doing some image processing, and this would trigger .then
on job promise in user/plugin code.
For the vast majority of cases we don't (need to) await, hence "site seems fine", but in general this seems like just another symptom of process exiting in unexpected way, as gatsby seems to expect to still be alive?
Just for reference, our setup for running queries in workers -
gatsby/packages/gatsby/src/commands/build.ts
Lines 294 to 301 in 82e3c8a
First we trigger query running, then we await for jobs to finish (so we can send those JOB_COMPLETED
messages) and only then we want to restart worker processes (to dump some memory that was allocated during query running there) so workers should stay alive for those messages unless something else crashes them
@pieh can you think of a way to get a more helpful error message so we can see what's actually causing the worker to exit, or any other way to track down the cause of the regression? My instinct still says lmdb, but it would be good to know.
Just thinking more on this:
process.send
can buffer messages if there is a lot of messaging, so maybe we end up in situation when messages get buffered, we don't await on this to flush (our process.send
there is launch and forget now) and try to restart/kill workers before all the messages were actually flushed?
Maybe we need something like
gatsby/packages/gatsby-plugin-gatsby-cloud/src/ipc.js
Lines 1 to 30 in 82e3c8a
@pieh can you think of a way to get a more helpful error message so we can see what's actually causing the worker to exit
We can add some additional logging (code/signal) in worker process "exit" handler:
gatsby/packages/gatsby-worker/src/index.ts
Lines 196 to 207 in 82e3c8a
Right now it only show something if worker exited while it was working on a task (like running query or html generation) - if worker stays "idle" after query running just waiting for jobs related IPC messages, then we wouldn't get trace if it exited.
On worker side of things we could add signal-exit
handler to add some logs as well.
The only solution that worked for me is to pin Gatsby to
v4.7.2
Hey, I'm also experiencing the same issue (v4.11.0) but I'm still a little green - could anyone explain what is meant by "pin Gatsby to v4.7.2
? Do you mean to downgrade the Gatsby version, and if so, what's the best failsafe way to go about doing this?
Thanks!
@AndrewNow Lock it in package.json, like "gatsby": "=v4.7.2"
@benlavalley thanks, rolling back to 4.7.0 worked for me.
"=v..."
isn't a thing is it, @benlavalley, in npm semantic versioning?
Presumably you mean simply, as mentioned earlier:
"gatsby": "4.7.2"
@AndrewNow then you'd run npm i
afterward. Alternatively npm i gatsby@4.7.2
would achieve the same thing.
+1 rolling back to 4.7.2 appears to resolve this for me.
Still experiencing this on 4.11.0
in Netlify. Just wondering if progress is being made on a fix for this at all? Rolling back to 4.7
isn't the best option, however it's still an option.
I can however, confirm that removing AVIF
from image formats has fixed this for me, as the build was falling over on the image processing job. This is fine really, as it's new and not in all browsers yet (https://caniuse.com/avif) and I added it more of a "oh that's cool" kinda thing.
// before
gatsbyImageData(width: 740, placeholder: BLURRED, formats: [AUTO, WEBP, AVIF], layout: CONSTRAINED)
// after
gatsbyImageData(width: 740, placeholder: BLURRED, formats: [AUTO, WEBP], layout: CONSTRAINED)
Same issue, have to use 4.7.2 as temporary fix.
I was experiencing the same issue on Gatsby 4.11.1
I was able to fix it by compressing my newest blog post images and using Gatsby 4.7.2
I've had some decent success with gatsby 4.11.x by explicitly unsetting AVIF generation in sharp's config
gatsby-config.js
{
resolve: `gatsby-plugin-sharp`,
options: {
defaults: {
formats: [`webp`, `auto`],
quality: 90,
},
},
},
Note that this error happens on a clean checkout of gatsby-starter-wordpress-blog
.
I've had some decent success with gatsby 4.11.x by explicitly unsetting AVIF generation in sharp's config
gatsby-config.js
{ resolve: `gatsby-plugin-sharp`, options: { defaults: { formats: [`webp`, `auto`], quality: 90, }, }, },
Thanks, will also give this a try
I also had to downgrade gatsby
(4.11.1 -> 4.7.2) to build on Netlify (as of today). No efforts to strip out avif
formats seemed to help.
For what it's worth, this happens for me locally (macOS arm64) as well. Not just Netlify 😕
For what it's worth, this happens for me locally (macOS arm64) as well. Not just Netlify 😕
I have reproduced it locally when cloning Netlify's build image and ensuring my memory limits match their production limits of what I believe was stated somewhere to be 2GB (https://github.com/netlify/build-image).
Probably correlated. I’m running a stock M1 with far more memory saturation than recommended (swap > total memory at this point) so I could see that being a factor 😅
reverting to 4.7.2 was my fix here as well (locally and on Netlify)
I ran into the same issue and "solved" it by not using Netlify to compile my Gatsby website.
Instead I use this GitHub workflow:
name: Deploy Build
on: workflow_dispatch
jobs:
build:
runs-on: ubuntu-latest
steps:
- name: Clone Repo
uses: actions/checkout@v1
- name: Install Dependencies
run: yarn install
# - name: Type Check
# run: yarn run type_check
- name: Build
run: yarn run build_deploy
- name: Deploy to GitHub Pages
uses: JamesIves/github-pages-deploy-action@4.1.4
with:
branch: deploy
folder: public
It compiles everything into the deploy
branch, which Netlify publishes. I gave Netlify these build settings:
true
is a program that according to its man page:
NAME
true - do nothing, successfully
DESCRIPTION
Exit with a status code indicating success.
This at least works as an intermediate solution until the problem is fixed.
I've had some decent success with gatsby 4.11.x by explicitly unsetting AVIF generation in sharp's config
gatsby-config.js
{ resolve: `gatsby-plugin-sharp`, options: { defaults: { formats: [`webp`, `auto`], quality: 90, }, }, },Thanks, will also give this a try
I tried this and it didn't seem to help. The only way I can get my site to build is by deploying with minimal static content, adding my static content back and redeploying (after Netlify has cached my previous deploy). This works for about 3 deploys, and then fails until I do it all over again. So annoying.
Disabling webp fully worked for me.
I had the same issue and I've been trying to fix it different ways but I didn't want to rollback Gatsby to previous versions. Currently I'm using 4.12.1v of Gatsby. @wahidshafique Try to use "Essential Gatsby" Netlify plugin with the latest version (3.0.0). This helped me to solve the issue.
I have a quite image heavy Gatsby site, with dozens of unoptimised 4k resolution images. After trying to disable AVIF generation, downgrading to 4.7.2, which both didn't work, I was finally able to get a deploy by using the Netlify Gatsby Plugin https://github.com/netlify/netlify-plugin-gatsby as by @Dmitry-Komkov recommendation, while still being on 4.7.2. Tried to also upgrade to the latest gatsby version, but that broke the deploy again...
It appears incontrovertible this error relates to memory usage. And many claim that freezing Gatsby at version 4.7.0 works.
Dumb question: has anyone identified the precise diff between 4.7.0 and next versions that changes the memory usage so much?
Would be great to solve this. Really frustrating.
It appears incontrovertible this error relates to memory usage. And many claim that freezing Gatsby at version 4.7.0 works.
Dumb question: has anyone identified the precise diff between 4.7.0 and next versions that changes the memory usage so much?
Would be great to solve this. Really frustrating.
I reverted to 4.7.0. (After trying everything else in this thread.)
Now it builds and deploys successfully at Netlify.
I experienced this error today and locking Gatsby version to 4.7.2 fixed it for me.
It's also because of the package manager? While I was using npm with Gatsby 4.7.0, it failed. But yarn worked properly...
+1 for the revert to 4.7.2 fix after trying everything else here. One of my sites just completely stopped deploying altogether, even on branch and deploy previews where gatsby is >4.7.2.
OK, so I've spent some time looking at this, and it appears to be a race condition. In the section you quoted, @pieh
First we trigger query running, then we await for jobs to finish (so we can send those
JOB_COMPLETED
messages) and only then we want to restart worker processes (to dump some memory that was allocated during query running there) so workers should stay alive for those messages unless something else crashes them
...it looks like there are JOB_COMPLETED
messages being dispatched after restart()
has been called. I forked gatsby-worker
and added a load of logging, and you can see it here. The message "about to end worker" is added at the beginning of workerPool.end()
here. You can see that the JOB_COMPLETED
message is dispatched afterwards.
If I put a 1 second await inside end()
then it works fine. See here, where the message is dispatched while it waits before restarting.
I'm not sure why waitUntilAllJobsComplete
is resolving before that final job is completed, but that seems to be the issue.
That's great news @ascorbic. Would this be a forthcoming update to the Netlify plugin, or is there another way to consume it?
@stevepepple This is a bug in Gatsby even if it mostly manifests on Netlify, so the fix needs to be in Gatsby. I will continue and see if I can work out a fix, but @wardpeet and @pieh have more context on that part of the codebase so may havbe a better idea.
@stevepepple This is a bug in Gatsby even if it mostly manifests on Netlify, so the fix needs to be in Gatsby. I will continue and see if I can work out a fix, but @wardpeet and @pieh have more context on that part of the codebase so may havbe a better idea.
Understood and much appreciated!
@stevepepple This is a bug in Gatsby even if it mostly manifests on Netlify, so the fix needs to be in Gatsby. I will continue and see if I can work out a fix, but @wardpeet and @pieh have more context on that part of the codebase so may havbe a better idea.
Agreed. This is an issue to be addressed in Gastby code. Anything else is a band-aid.
@ascorbic, your research is appreciated.
I'm going to continue looking at this tomorrow, so hopefully I can work out a fix. If so I'll open a PR. I'm always happy to contribute a fix if I can.
Further progress. It is a race condition.
waitUntilAllJobsComplete
waits for hasActiveJobs
to resolve, which happens when activeJobs
is 0
. This happens inside enqueueJob
here:
gatsby/packages/gatsby/src/utils/jobs/manager.ts
Lines 314 to 319 in 00220f4
Once that resolves, then the workerPool is restarted:
gatsby/packages/gatsby/src/commands/build.ts
Lines 298 to 300 in 00220f4
The trouble is that at this point, the final JOB_COMPLETED
message has not dispatched. The problem is where this happens, inside initJobsMessagingInMainProcess
. The job is created here by dispatching createJobV2FromInternalJob
and then waiting for the result.
gatsby/packages/gatsby/src/utils/jobs/worker-messaging.ts
Lines 20 to 25 in 00220f4
The problem is that hasActiveJobs
has already resolved by this point, because it happened inside enqueueJob
(which was itself called when dispatching createJobV2FromInternalJob
). This means that the workerPool is already being restarted at this point, as we're about to dispatch an end message to a worker that is currently shutting down.
There are a few ways we could fix this. My favoured one would be to dispatch JOB_COMPLETED
inside runJob
, rather than waiting for Redux. This would ensure that the message has been sent before the activeJobs are decremented and hasActiveJobs is resolved. If this sounds reasonable I can open a PR to move this from initJobsMessagingInMainProcess
into runJob
(probably something like runLocalWorker(worker[job.name], job).then(() => sendTheJobCompletedMessageOrSomething())
). What do you reckon?
I realise of course that this approach won't work, because that that point it doesn't have a reference to the worker to send the message. I'll think of another approach.
Uff nice debugging Matt ;)
I think suggestion make sense in general, the only thing that I would think about is that inside runJob
we don't have context wether we should dispatch it and if so - to where
Haha. Ten seconds before you. WHere do you think would be a good place for this?
We had chat about this internally, and there is more to unpack here (not just the race condition described above).
Whole reason we even send JOB_COMPLETED
back from main process to worker is that createJobV2
returns a promise that can be either await
ed on or just .then()
/ .catch()
ed so plugins can do something after job is completed and we wanted to preserve that.
await
case it's non-problem (generally) because jobs are usually created in resolvers so whole query running will be "blocked" until awaited jobs are finished anyway.
.then()
/ .catch()
(or when resolver don't await on job, but still would want to do something) is more problematic (and reason for separate waitForJobsCompleted()
). We do use it in gatsby-plugin-sharp to log some message now:
gatsby/packages/gatsby-plugin-sharp/src/index.js
Lines 136 to 145 in 51cd9be
More interesting bits - when we do .restart()
on worker pool - we first just tell worker to try to exit gracefully (by removing message listener which assuminng nothing else is still "running" should allow process to exit) ... but if it doesn't within 1 second we send SIGKILL
:
gatsby/packages/gatsby-worker/src/index.ts
Lines 266 to 287 in 51cd9be
+
gatsby/packages/gatsby-worker/src/child.ts
Lines 96 to 97 in 51cd9be
Given that there is proxy promise alive in worker (that get resolved/reject after receiving message from main process) - that should keep the process alive until it's resolved/rejected - maybe the whole problem can be fixed by adjusting the "forced kill" logic instead? Much longer grace period? I would also have questions about wether current graceful ending of process even work now or we basically always rely on forced process kill
Thanks for looking at this. I don't think the forced ending is the issue. If you look at these logs you can see that the forced kill has not happened yet when the error occurs. The problem is that the message catches the worker during the shutdown process. It sends END
, which removes the listener to allow the graceful stop. When it starts to shut down, node closes the ipc channel, meaning that any future writes will cause the EPIPE error, so it doesn't have the opportunity to complete as the JOB_COMPLETE
message fails. Allowing a graceful exit is the problem, because that exit is what triggers the attempt to dispatch the message, but by that time the channel has closed.
The solution would be to find a way to ensure that either END
isn't dispatched until JOB_COMPLETE
has been sent, or that the event listener isn't removed until it has. Alternatively find another way to resolve the promise that doesn't require dispatching JOB_COMPLETE
.
You are right ...
Maybe simplest solution is to add tracker like hasActiveJobs
promise / activeJobs
counter from https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby/src/utils/jobs/manager.ts to https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby/src/utils/jobs/worker-messaging.ts and await on it as well? So we do await jobs, but also await on all pending messages to be sent before resolving that promise (possibly the proper thing to do would be just awaiting those pending messages and not even jobs before restarting workerpool).
That sounds like a reasonable solution. So that would be handled in initJobsMessagingInMainProcess
. Would you import that promise into manager.ts
and await it in waitUntilAllJobsComplete
, or have a separate funciton tracking this? I'm happy to do a PR for this if that sounds good
I think we should export function returning current promise from worker-messaging.ts
(promise can be recreated later on, so we need getter to get new one and not just original one), but we can await on it inside existing waitUntilAllJobsComplete
(Promise.all
?) - this would delay waitUntilAllJobsComplete
potentially by one or 2 setImmediate
ticks (that currently happen after job finish and before messages are sent?) so should be fine and not really noticable and it will just be nice to share it with other places that await jobs (feels much safer than doing it just in that one spot)
This is proving harder than I thought. Simply importing anything from worker-messaging
into manager.ts
casues builds to fail with TypeError: _actions.actions.apiFinished is not a function
. I think this is because it imports redux, meaning the store is setup in the wrong place. I may need to import it directly into build.ts
and then await that alongside waitUntilAllJobsComplete
before restarting the pool
I have a draft PR up. Does the approach look ok? I've not tested it properly yet which is why it's still a draft, but I'll look into it tomorrow.
This is proving harder than I thought. Simply importing anything from
worker-messaging
intomanager.ts
casues builds to fail withTypeError: _actions.actions.apiFinished is not a function
. I think this is because it imports redux, meaning the store is setup in the wrong place.
Gosh, that's annoying - import hell :)
I may need to import it directly into
build.ts
and then await that alongsidewaitUntilAllJobsComplete
before restarting the pool
This sound like very pragmatic approach and I approve :)
I have a draft PR up. Does the approach look ok?
Yup - I might rename those variable to match better what they represent (so it's not confusing for future readers) but overall this is effectively what I had in mind.
I think the PR should go in regardless if it fixes the issue completely - it just make sense on its own - if that will be not sufficient we will continue exploring of course
Hey folks, we just released changes from #35513 - huge thanks to @ascorbic for debugging, figuring out possible solutions and actually implementing one!
Above change is available in:
@latest
-gatsby@4.13.1
- (if you like "bleeding edge")
@next
-gatsby@4.14.0-next.2
Please do try it out and let us know wether you see still those kind of errors showing with either of above version of gatsby
I'll re-open this issue (it was auto closed as linked PR was merged), but we do want to hear back from you folks wether the issue was resolved with above versions for you.
Will keep it open for a week and if there will be no new reports will close it then
I just tested it out on a site that had issues with this bug with all versions prior to 4.13.1. I did three deploys in total and have not experienced any issues yet. Seems to work great!
So far so good! My v4 branch that was failing consistently now seems to be building consistently! Thank you @ascorbic!
Also now having success with 4.13.1 where the build was mostly failing in Netlify before (I rolled back to 4.7.2 while this was being looked at). Thanks!
Thank you! Works for us, too.
Tried to bail to Cloudflare Pages, but they don't have the same type of support for Gatsby functions 😅
Thank you! I've built several times successfully on Netlify.
I have had this problem since starting in 4.11.2 (I'm new to gatsby and netlify and couldn't figure out if I was doing something wrong!) and had to resort to building locally until this very moment. The release of 4.13.1 was the first time I could build successfully on their servers!
I updated Gatsby to v4 last year and after 4.13.1 for the first time in months I've been able to run a clean deploy without any crazy routine of clearing content from my site. Awesome work!