Audit build process for dependencies
mhdawson opened this issue · 24 comments
Following the great work in #828 I think the next step is to audit any build steps related to dependencies and tools used in those build steps from a repeatability and supply chain perspective.
For dependencies that only contain a direct copy of the original source code for the dependencies and are completely built from that source during the Node.js make process there should be nothing to do other than to confirm they fall into that category.
For others, which have additional build steps we'll want to see what gets used/pulled in during those build steps, and see if we are comfortable that we can get the exact versions of the tools used in the future and that we have enough information that we complete the build steps if needed for some future Node.js release in a way that generates the same result. If we dynamically pull in tools we should also confirm we are confident that the source is trusted and not a potential source of a supply chain attack.
A good example of this is undici were there are WASM binaries that are build outside of the Node.js make step. These use dynamically installed packages in a docker container. Would we get the same version of those packages if we installed them a year from now? If not we might be introducing unexpected changes as we do maintenance releases.
Lets go through each of the dependencies and categorize them into
- Original source - no issue, fully build in
Node.js make nodejs
- Build steps from original source - fully documented and repeatable
- Build steps - needs work or unsure.
- acorn
- acorn-walk
- ada
- base64
- brotli
- cares
- cjs-module-lexer
- corepack
- googletest
- histogram
- icu-small
- llhttp
- minimatch
- nghttp2
- ngtcp2
- npm
- openssl
- postject
- simdutf
- undici
- uv
- vwasi
- v8
- zlib
Also note that we may need to not only look at the steps in the tools/dep_updaters
directory but also look at the tools used to generate a new release of the dependency if what we copy over contains what would not be considered original source
.
In terms of the dependencies which build WASM (undici, cjs-module-lexer, llhttp) some of them currently using a docker file to dynamically build a container and use apk to install packages. I have a feeling that what we get from the apk install may change over time.
We may be able to improve repeatability by building that docker image in advance and storing it in the GitHub container registry in the nodejs org. We could then use the same set of images across all projects were we build WASM binaries.
The main problem we have right now is that builds are not repeatable because we dont keep track of certain informations such as version of the tools we use to install, package locks, sources itself. (example: dependency github project is deleted)
I guess the first step is to identify how our dependencies are installed and what they contain.
We can check the following points for any dependency:
- Where we download the dependency from? (npm, github release, github source code, google source) this information should probably added in the maintaining dependency document
- Do we execute an install command? (npm install, make, configure) or we just copy files
- Is the dependency code equal to the source code of the project where we download from. Is it minified or changed during the release process?
- Are we aware of the tools required to install that dependency? for example if we do npm install do we keep track of npm version or lockfile?
Feel free to edit and add more question, improve etc...
I suggest using the same approach we used for automating dependencies. We could have a main track issue (possibly this one), and whenever we work on a specific version, we can create a separate issue to discuss it there.
Please have a look at this PR nodejs/node#49747
acorn-walk
- it's javascript
- It's in the acorn monorepo
- It goes through bundling with rollup pre release
- It is installed by downloading the tarball from npm registry (example: https://registry.npmjs.org/acorn-walk/-/acorn-walk-6.0.0.tgz) with npm pack and the content is copy pasted into
/deps/acorn/acorn-walk
tools we use to install:
- npm:
npm view acorn-walk dist-tags.latest
to check latest upstream,npm pkg get version
to get installed version,npm pack
to download - shasum
- perl
- tar
acorn
- it's javascript
- It's in the acorn monorepo
- It goes through bundling with rollup pre release
- It is installed by downloading the tarball from npm registry with npm pack
- we move the folder in
deps/acorn/acorn
- we update
/src/acorn_version.h
file with the new version
tools we use to install:
- npm:
npm view acorn dist-tags.latest
to check latest upstream,npm pkg get version
to get installed version,npm pack
to download - shasum
- perl
- tar
minimatch
- it's javascript
- Minimatch repo
- It is installed by downloading the tarball from npm registry with npm pack and the content is copy pasted into
/deps/minimatch
- It goes through a bundling process done by us because the library has a dependency
brace-expansion
, so we need to install it and bundle it:- install eslint:
npm install esbuild --save-dev
(keeps track of everything in package lock) - add a script for bundling in the package json:
pkg set scripts.node-build="esbuild ./dist/cjs/index.js --bundle --platform=node --outfile=index.js"
- run bundling
run node-build
- create an index.js self contained file
- install eslint:
tools we use to install:
- npm:
npm view minimatch dist-tags.latest
to check latest upstream,npm pkg get version
to get installed version,npm pack
to download - shasum
- perl
- tar
- esbuild
undici
- it's javascript and wasm (llhttp)
- Undici repo
It is installed by the following process
- create a temp folder
npm init
npm install undici --global-style --no-bin-links --ignore-scripts
- cd node_modules/undici
npm install install --no-bin-link --ignore-scripts
npm run build:node
(this a command in the undici package json)- runs this command:
npx esbuild@0.19.4 index-fetch.js --bundle --platform=node --outfile=undici-fetch.js --define:esbuildDetection=1 --keep-names
- It goes through a bundling process with esbuild (note the fixed version) that generates
undici.js
- move the content inside deps folder
tools we use to install:
- npm
- shasum
- perl
- tar
- esbuild
I think this process can be improved a lot since are not keeping track of the tools used, doing at least one npm install that could be skipped by downloading undici as npm pack.
Also the fixed version of esbuild in the command should updated regulary (we are at v0.19.11).
The point of esbuild is to create a single file that includes all the dependencies.
The wasm files are built during release, they come within the package.
cc @mhdawson
@marco-ippolito thanks for the summary will try to take a look tomorrow
@marco-ippolito I think we should add to your list above for undici, because I believe these are run automacailly as well to generate wasm:
"prebuild:wasm": "node build/wasm.js --prebuild",
"build:wasm": "node build/wasm.js --docker",
These run docker and build a container, which may end up with different versions of the wasm tools and dependencies each time they are run.
As discussed it would probably be good to have a session dedicated to working through this example and the other JavaScript deps since it sounds like you have looked at a number of them and we have a number of decisions to make. Would you like to set that up?
Deep dive
Please add your availability on the calendar, I will then send an invitation based on it.
https://doodle.com/meeting/participate/id/e3Yjn2xb (PS use an ADBLOCK there are ads because its a free version 😢)
@nodejs/security-wg
@marco-ippolito filled in my avail
@nodejs/security-wg I'm going to leave the doodle open for the weekend the pick a date.
according to the doodle the preferred date is:
Date: 1st of February
Time: 3:00 PM 4:00 PM (UTC+1) Italy Time
@rudd
@richardlau
@mhdawson
If it's ok we can create a zoom link and add it on the calendar
@marco-ippolito lets just use the zoom link from the Security-wg, There are no other Node.js meetings go on at the same time - https://zoom.us/j/92309450775
according to the doodle the preferred date is: Date: 1st of February Time: 3:00 PM 4:00 PM (UTC+1) Italy Time @rudd @richardlau @mhdawson If it's ok we can create a zoom link and add it on the calendar
@mhdawson I can see a calendar entry for this in https://nodejs.org/calendar, but I think it's at the wrong time?
(times here are UTC)
according to the doodle the preferred date is: Date: 1st of February Time: 3:00 PM 4:00 PM (UTC+1) Italy Time @rudd @richardlau @mhdawson If it's ok we can create a zoom link and add it on the calendar
@mhdawson I can see a calendar entry for this in https://nodejs.org/calendar, but I think it's at the wrong time?
(times here are UTC)
It should be 1 hour before the security meeting
2pm UTC
@mhdawson I noticed on the node.js calendar the event but at wrong time
@marco-ippolito moved entry in calendar.
Towards the end of the meeting there was a bit of confusion over corepack not having an updater script in tools/dep_updaters
-- that's because the update steps are in the Makefile.
Notes from the meeting today, sorry we forgot to record
-
In order preference
- All build steps are in Node.js build
- Run build steps in deps update scripts
- Build steps in repo/manual check
-
General principles
- We should have a copy of primary source (likely GitHub) and run build steps
- Must keep copy of project as GitHub is not immutable
- Ok to pull tools from npm, we must keep package-lock
- We should have a copy of primary source (likely GitHub) and run build steps
-
npm
- trust npm, provided we keep package-lock, assets download from there are immutable and
- Link back to GitHub repo, can we even trust npm contents?
- can’t trust that package does not pull in other assets outside of npm
- for example pre-built binary
- Run with no-scripts
- Does this turn off binding.gyp
- Run, with networking restricted to only npm (for npm install)
- Run, with networking completely disabled for other steps like esbuild
-
Typescript projects ?
- Conversion of ts
- Publish as js
- No info on transpilation step
- Similar to ESbuild, so likely a concern, led to discussion of first principles above, should we
include any code from npm or go to original project
-
Looked at deps folder, in JavaScript ones
- acorn, acorn-walk
- Pure JavaScript,
- npm init
- npm install acorn
- Then take node_modules which would only have acorn
- Updated to to
- npm pack, download directly zip
- unzip
- then copy paste
- Improvements
- could use curl
-
Minimatch, more complicated because it has deps
- Original
- Has typescript
- npm init
- npm install
- runs esbuild to pack into one JavaScript file
- one js file is copied over to deps
- Original
-
New
- npm pack
- npm install, keep package.lock
- run esbuild (now as node build)
- one js file is copied over
-
undici
-
Has deps
-
Has build step
-
Build step is already in package.json
-
Steps
- npm init
- npm install undici
-
npm run build node
- uses npx with esbuild at a fixed version (will likely move esbuild back to AIX)
- pre-req step uses alpine docker image to build wasm
-
corepack, pretty simple, cut paste
-
cjs-module-lexar
- Has separate build step, generates WASM
-
This issue is stale because it has been open many days with no activity. It will be closed soon unless the stale label is removed or a comment is made.
This issue has been inactive for 90 days. It will be closed in 14 days unless there is further activity or the stale label is taken off.