aws/aws-cdk

[aws-lambda-nodejs] a single lambda creation takes 30s every time

vvo opened this issue ยท 23 comments

vvo commented

tl;dr;

Node.js lambda are slow to build because CDK uses docker and mounting a complete project inside docker is super slow on macos.

There's work being done on Docker side, but it will probably always be very slow for now.
There's work being done on CDK to allow using a local parcel instead of inside docker, see
#9632.

For an alternative solution, I created https://github.com/vvo/aws-lambda-nodejs-webpack which uses webpack locally and is a lot faster


Hi there, when using aws-lambda-nodejs, every cdk synth will take up to 50s, even for a very simple setup. I understand aws-lambda-nodejs uses docker + parcel but even that, with the right cache, performance should not be an issue on successive runs because of Docker / parcel cache. So I guess something is weird here.

As discussed over email with you @jogold, I am creating a new issue that you can reproduce on your side.

Reproduction Steps

I created a repository: https://github.com/vvo/aws-lambda-nodejs-performance

The stack code is:

const cdk = require("@aws-cdk/core");
const lambda = require("@aws-cdk/aws-lambda-nodejs");

class HelloCdkStack extends cdk.Stack {
  constructor(scope, id, props) {
    super(scope, id, props);
    new lambda.NodejsFunction(this, "my-handler");
  }
}

module.exports = { HelloCdkStack };

Reproduction:

git clone https://github.com/vvo/aws-lambda-nodejs-performance.git
cd aws-lambda-nodejs-performance
npm install
time cdk synth

# [... cdk output]
cdk synth  1.82s user 0.63s system 8% cpu 27.730 total

Here it's 27 seconds, but I have seen it go up to 50s.

Error Log

No error message

Environment

  • CLI Version : 1.51.0
  • Framework Version: 1.51.0
  • Node.js Version: 12.18.2
  • **OS : MacOS 10.15.5
  • Language (Version): JavaScript

Other

I'd like to debug further and understand why, maybe, the cache is not used fully but not sure how to do that.


This is ๐Ÿ› Bug Report

eladb commented

Copy @jogold

This looks like #9032 (fixed in #9067) can you try installing and adding @babel/core to your package.json?

vvo commented

@jogold Just updated the repository (https://github.com/vvo/aws-lambda-nodejs-performance), same results, it always takes at least 20s on successive launches of cdk synth.

Is there a way to see the output of the docker build to see what's going on?

I am not the author of this wonderful package (aws-lambda-nodejs) but it really seems like maybe the yarn/npm cache is not kept in between docker builds maybe?

Thanks!

Had a discussion with @vvo, the issue seems to be linked to Docker performance issues on his machine. Still investigating at this stage.

vvo commented

Quick tl;dr; on the status of this issue: it's a macos issue. macos filesystem performance through docker containers has been "bad" for some time already (docker/for-mac#1592), but a lot of progress was made.

The latest update is coming on the docker for mac edge version https://docs.docker.com/docker-for-mac/edge-release-notes/ where by default they use a new syncing mechanism called "mutagen" (https://mutagen.io/) which makes things faster.

Still, on a real-size Node.js project (~ 600MB of node_modules resulting in A LOT of files), mounting the filesystem to docker takes 20s with mutagen (100s without mutagen).

The only issue right now is that for mutagen to work well, for now, you have to "flush" data to the host (otherwise asset output is empty when we check it). See docker/for-mac#1592 (comment).

Let's wait for the Docker team to reply, but macos filesystem under docker is heading to the right direction.

Alternatively, a community-supported nodejs-lambda that would directly use Parcel instead of using Parcel inside docker would make it way way faster, as discussed with @jogold. But I understand this is not the direction the cdk team has headed towards, as @jogold explained to me and that's fine!

Thanks all :)

@eladb Docker performance is really bad on macOS and especially for the NodejsFunction where we need to mount the project root, which can be "heavy". On Windows and/or Linux (so also in CI) performance is acceptable.

what do you suggest here?

eladb commented

Didn't we configure the volume mapping to use some sort of special mac sauce?

The volume mount uses :delegated but with the Edge Version of Docker4Mac this made it worse. It takes forever or crashes. And if it works by chance it fails afterwards because the sync is not yet done and CDK does not find the assets.

The volume mount uses :delegated but with the Edge Version of Docker4Mac this made it worse. It takes forever or crashes. And if it works by chance it fails afterwards because the sync is not yet done and CDK does not find the assets.

What I would like to understand is where all the time is "lost": when mounting the volumes? when Parcel runs? when content is written to /asset-output? What about using cached and/or read only for /asset-input? What is the influence of the size of /asset-input?

vvo commented

when mounting the volumes?
when Parcel runs? (+when content is written to /asset-output?)

As you know, I did a lot of testing on this, results:

On docker stable version: both actions are extremely slow. Mounting is slow (node_modules, lots of files) and parcel is slow (lots of I/O). On my current project, cdk synth can take up to two minutes, at every run. 30s was for an almost empty Node.js project (no dependencies).

On docker edge/beta version: only the mounting is slow (still 5x faster than stable), afterwards parcel is as fast as native/without docker. On docker edge, :delegated currently uses mutagen by default (may or may not be the end of the story once it reaches docker stable, it might be a different flag)

:delegated is always the fastest for any operation on docker mac os, as seen here:

Docker edge makes the mounting "faster". But as for my testing goes, 20s to mount a real world Node.js project will never be acceptable... By real world I mean my project, approx 600MB node_modules, which is not even big compared to more "enterprise" projects I have seen.

We can wait for docker to be faster (but it has been "slow" on macOS filesystems mounting for 3 years already) OR we can make it so it does not use docker on mac. I understand this is not an area you want to go but unfortunately, the road you took will, for now, lead to bad performance on mac for anyone trying to build and deploy Node.js lambdas on AWS using this tool.

This will impact A LOT of people. Data I found is from the 2018 Node.js developer survey: MacOS is leading the Node.js developer environment with 41% of users (Windows at 24%), https://nodejs.org/en/user-survey-report/#Primary-OS-Distro. This means that for most people, the experience is not good for now. And those are the people that will try cdk and abandon it if they have the choice.

For comparison, I tried https://github.com/pulumi/pulumi and it was instant almost for every command, I still want to use CDK because I do not want/need multi-cloud. And pulumi has some weird defaults as for file organization and how it builds lambdas.

Other areas to test:

  • yarn v2 performance. What might be impacting performance may not be the SIZE of the project but rather the number of files of the project. And unfortunately, node_modules are a LOOOOOT of files to mount. While new versions of yarn have fewer files in node_modules if I remember well ("pnp"?)
  • as for running parcel outside of docker, npx is GREAT! https://github.com/npm/npx#readme. It just means that on first launch it would install it, without even installing it globally and then executes it. npx is bundled with Node.js since 8.2.0

What about using cached

We tried, I tried, results are worse.

Other tests done:

  • read only + delegated on BUNDLING_INPUT_DIR: no change versus just delegated
  • read only + cached on BUNDLING_INPUT_DIR: no change versus just delegated
  • cached on BUNDLING_INPUT_DIR: no change versus just delegated
  • again, when using delegated or cached, the slow part comes from mounting, parcel is fast as seen in the docker output: parcel takes always 5s while the whole docker run takes 20s

Naive question: does the node_modules have to be mounted into the docker container? Could we do the package install inside of the container instead?

Naive question: does the node_modules have to be mounted into the docker container? Could we do the package install inside of the container instead?

The node modules are bundled so Parcel needs to be able to find them in the container. But we also mount at the project root to be sure to include everything that is referenced in the Lambda code (it could be a util in another package in a monorepo config for example).

For people having performance issues, could you report timings for synth in those two repos:

vvo commented

(Note: edited July 28th 2020 since I moved from rollup to webpack)

Hey there, I finally gave a try at creating an alternative aws-lambda-nodejs. It's here: https://github.com/vvo/aws-lambda-nodejs-webpack.

Now cdk synth|deploy takes 10s, including the build time of webpack, instead of 200s using the regular aws-lambda-nodejs (again, those 200s are 99% due to mounting the project in Docker, docker performance issue).

Features:

  • fast, no-docker CDK construct
  • lambda output only contains the necessary files, no README, tests, ... (just like current construct)
  • bundling happens in temporary directories, it never writes in your project directory
  • source map support
  • Babel with preset-env support
  • TypeScript support

As said in the README:

I want to be clear: I respect a LOT the work of the CDK team, and especially @jogold, author of aws-lambda-nodejs) that helped me a lot into debugging performance issues and explaining to me how everything works.

I hope this will help people having performance issues on macOS or if they're just looking for an alternative lambda builder.

I yet have to test it in more details in production, but as of now it just works for simple use cases. And we can always add more (I added a roadmap).

Let me know what you think!

For people having performance issues, could you report timings for synth in those two repos:

So I tested it with the monorepo and the latest Docker4Mac Edge version. The first run took 4 minutes and then failed with no output as the sync was not done in time. Subsequent runs then took about 20 seconds and succeeded.

The first run took 4 minutes

I assume this is because the Dockerfile changed in 1.54.0 and the image had to be rebuilt? Doesn't explain the failure at the end though.

Subsequent runs then took about 20 seconds and succeeded.

This is still a lot... I'm around 7 seconds on average. If you have a minute (or more ๐Ÿ˜„), could you try with the other repo where a npm install is run inside the container?

A temporary solution for macOS users could be to use the CDK_DOCKER env var to provide a custom script that parses the args and runs Parcel locally.

For the other repo the first run was about 3 minutes and subsequent runs about 40 - 50 seconds.

The first-run delay seems to be the sync that Docker does for the folder. It copies the whole folder to the hyperkit vm.

The asset-output error did not happen this time. It seems related to what @vvo described earlier.

First the requested info...
https://github.com/jogold/cdk-lambda-nodejs-monorepo
yarn cdk synth 3.85s user 0.82s system 9% cpu 48.052 total
https://github.com/jogold/cdk-s3-thumbnail
npm run cdk synth 4.77s user 0.75s system 12% cpu 44.762 total

This one is the project I'm working on right now:
npx cdk synth 4.30s user 0.48s system 7% cpu 1:00.98 total

And here's one of my sample projects: https://github.com/elthrasher/cdk-step-functions-example
npx cdk synth 5.09s user 0.94s system 3% cpu 2:39.58 total

A minute is a long time to wait - especially since I must synth to run unit tests. The numbers above are after running several times, so Docker layers should be cached as well as possible.

  1. I get the reasons for building with Docker, but building the functions one at a time as opposed to being able to have a single parcel/webpack/whatever build with multiple entrypoints isn't going to scale real well. My project has five functions, which is why it takes more than two minutes. I tried to figure out CDK_DOCKER, but it seems like I have to provide a docker-compatible API? It would be great if I could just bail out of the whole thing and agree to supply my own parcel (or webpack, which is more complex, but also more mature) build.
  2. I tried putting my stack into my build system, which is Docker-based and it's nearly impossible to get it to work. I must share my docker.sock (which is fine), but doing that means the volume share will go back to the HOST, not the container I'm trying to build in, so paths get crazy since this module attempts to take the path from the container, not host, but the volume will be shared to the host! (see also: https://serverfault.com/a/819371) For this reason, I basically flat out can't use this module without switching to another build system.
  3. Even though it looks like I need a non-Docker option, if I were to build in Docker, I'd really try to figure out a way to COPY the source into the initial build, then just share the output (perhaps via docker cp). That would give me the advantage of Docker caching on the build, which is a great thing to have. Very often my team has made a change to the stack that has nothing to do with the functions, yet we must always wait for the functions to build from scratch every time.

I'm at the point where I either have to do a two-step build (build with webpack, then use lambda.Function) or come up with my own construct, but I thought I'd see where you guys were. I can contribute here or go my own way (or maybe there's some sorcery with CDK_DOCKER you can point me to).

Thanks for listening :)

eladb commented

@jogold feels like we may need to support bundling outside of docker in certain environments. Let's kick off a GitHub issue to discuss.

Perhaps the direction should be to check it the runtime environment has the required dependencies (e.g parcel in the right version) and only if it doesn't, fall back to docker. This way we can get the docker portability without compromising in environments that support local bundling. Curious how much of this we can/should do in the bundling layer itself and how much is lambda specific.

vvo commented

@elthrasher, side note, if you want to try a non-docker lambda builder, have a look at the one I created after having performance issues with the default one, see here: https://github.com/vvo/aws-lambda-nodejs-rollup

@vvo Thanks, I saw that, but I'm using TypeScript. Would be glad to give it a try and give you feedback after you introduce support for TypeScript.

vvo commented

@elthrasher Just moved from rollup to webpack (rollup could not handle well some npm modules bundling) and added TypeScript + Babel preset-env support. Give it a try and let me know if this works for you: https://github.com/vvo/aws-lambda-nodejs-webpack

Small status update here: we are going to offer a way out of Docker. I expect this to be available before the end of August.