discussion points: Tooling group feedback session
boneskull opened this issue Β· 36 comments
Following up on the first tooling user feedback session, we want to gather points of discussion for future sessions.
From #38, these were the original questions from the first session:
- Describe how your Tooling leverages Node.js.
- Why do you use Node.js for Tooling?
- What's working in the Node.js tooling ecosystem?
- What isn't working in the Node.js tooling ecosystem?
- What's new that could impact Node.js?
We didn't have enough time to cover everything we wanted to--and give everyone a chance to speak--so future sessions should be more limited in scope.
I'll use this issue to gather ideas, and distill them into a "living document" of sorts (a Markdown document, living in this repo) to be used going forward.
cc @dshaw @bnb to confirm I understood this correctly. π
ALSO, PLEASE UNDERSTAND: This issue isn't a platform for discussion; we're just making a rough list of topics to discuss.
One that came up offline (and I think I may have mentioned briefly) in the "what isn't working" category is portability. I'd like to expand on that a bit, from the perspective of a tooling author.
fs.watch()
is seemingly neglected, and doesn't work well outside of Linux, giving rise to userland modules like chokidar. Ideally,fs.watch()
would work as intended across all supported Node.js platforms.- graceful-fs shouldn't exist.
- cross-spawn shouldn't exist.
- More generally, and not necessarily a concern about portability, fs-extra, rimraf and mkdirp shouldn't exist.
- Error codes should be normalized across platforms (@iarna)
- More care given to buffering and standard input/output (@iarna)
Feature requests
- GPU-accelerated code bindings (original issue; @TheLarkInn)
- Expose allowed command-line flags in machine-readable format (open PR)
I also want to cc everyone else who participated in that session; please add anything here you want to discuss, but didn't get a chance to. To all of you: This issue is for gathering topics of discussion for future Tooling-specific Node.js user feedback sessions. Your input is greatly appreciated!
Please cc anyone else who might want to participate! π
IMHO #59 (comment) is extremely important.
Really hoping someone else can add something.
I'm happy to drive the bus here, but there's no point in driving an empty one.
Going to ping a few people who I know that work on OSS projects that use Node.js for tooling in some cases.
To those I'm pinging: we're gathering points to discuss about Node.js + the tooling ecosystem and what we could do better/what's already working well. Would β€οΈ your input if you've got the time to share itβand no worries if you don't!
@johnpapa may have some observations from the Angular core community π€
Not sure how much RxJS uses Node.js for its tooling, but @ladyleet may have some input (or know someone from RxJS who does!)
Pinging @zeke because I know he does a LOT with Node.js tooling, and works on Electron which has some tooling built in Node.js as well.
@tzmanics may have input from NativeScript/Angular + Node.js backgrounds.
@mxstbr does some pretty impressive work in the React ecosystem and always has super good feedback.
Yeah for sure - well you know what is really missing is pinging the folks working on CLIs - they use Node.js a lot of course. This is something I really hope that in the next year or ongoing we can bring to light. It is one of the most essential and important use cases of Node.js IMO.
For RxJS - @benlesh can provide more info.
For Angular CLI - @hansl or @Brocco
For Vue CLI - @yyx990803 or @chrisvfritz
For Ember CLI - @stefanpenner
For Preact CLI - @prateekbh
For Create React App - @gaearon or he can point you.
For React Native - maybe @TheSavior?
For NativeScript - @jenlooper or @sebawita can point you.
For Ionic - @mhartington can point you.
I know I have a fair amount of thoughts here but I think @zertosh has a superset of my thoughts so pinging him instead.
I think the main one Iβve talked to @zertosh about is the overhead of starting up a node script. Making scripts fast is hard and one of the main things driving us to experiment more with other languages for scripts.
@TheSavior thanks much! @zertosh love to have you as a part of this conversation.
I'm not sure I have much to add beyond what @boneskull already mentioned regarding portability. That's the only significant pain point that's coming to mind. I personally don't mind pulling in separate packages, but when a built-in exists it's frustrating when it doesn't work cross-platform. A lot of people have to learn that the hard way.
I am one of the lead engineers for Appcelerator Titanium SDK's Node.js-based CLI's and tooling, which began 6 years. I went through our projects and analyzed the boilerplate functions and dependencies and here's my first pass list of things:
- As stated above,
fs.watch()
has serious cross platform issues which I have worked around in appcd-fswatcher, namely recursion support on Linux and the ability to watch files/directories that do not exist or require privileges to access. - It would be nice if
path.resolve()
would check if the path starts with~
and replace it withos.homedir()
. os.arch()
should report the machine's architecture, not the architecture of the Node binary. A script should be able to identify if it's being run in a 32-bit version of Node on 64-bit machine. We use this for determining which native executable dependencies to install/run and telemetry so we track when we can drop 32-bit support. Side note, I know there was talk of Node dropping 32-bit support, but I don't think the world is there yet.- Built-in support for check if a path is an executable file. It's not enough to check flags or file extension. For now, we can use isexe.
- Built-in support for
which
. It is very handy for finding the full path to an executable and ensures that the path is indeed an executable (see previous item). - Built-in support for generating UUID's. I know there are package name conflicts, so maybe hang it off util or crypto. I know I can generate random bytes using crypto, but I'd like to not have to pull in a dependency to generate a simple v4 UUID.
- A lighter-weight sandbox construct.
vm.createContext()
is perfect for 100% isolation, but I want a jail. If the code in the sandbox throws aTypeError
, I can catch it outside, but I can't do aninstanceof TypeError
because the built-in types are not the same definition. - The ability to complete destroy a vm context. You can create contexts all day, but apparently they can only be cleaned up on exit.
- This is a stretch, but temp directory/file creation and cleanup would be awfully nice.
- The ability to spawn another Node process with a TTY context such that the child Node process can
pause()
,setRawMode()
, etc onprocess.stdin
. This would be great for integration testing so we can use mocha instead write shell scripts.
@boneskull We are looking at 2018-06-08 for the next session.
I work on internal developer tools at Facebook - mostly on CLI tooling for developers working in our monorepo. There's a huge ecosystem of tools in this space, and some of it is written in JS running on Node. Other tooling is written in Python, PHP/Hack, Bash and even Rust. So Node often gets compared to these (I won't go into that here).
Why do you use Node.js for Tooling?
- Cross-platform support.
- Tooling has to work on Macs, Linux, and Windows. For the most part, if you stay clear of any native modules, Node will Just Work in whatever platform.
- Ease of distribution.
- We've made it really easy to use Node for tools in the monorepo. We check-in the Node runtime for all of our target platforms, and use Yarn with an "offline mirror" to install npm dependencies. This frees tool authors from having to worry about bootstrapping their tool in whatever environment it has to run (e.g. developer machines, CI system, etc).
- Tools that need to work w/o a checkout can have a simplified build that produces a single JS artifact and includes all of the Node runtimes. This frees authors from having to worry about producing multiple releases of their tool.
What isn't working in the Node.js tooling ecosystem?
- Startup time.
- This is the single biggest complaint.
- The overhead of starting the Node runtime is really hard to hide even from noop commands like
mytool --help
ormytool --version
. Some tools that get invoked thousands of times during builds have had to be ported away from Node because of the cost of loading the runtime. - But even loading user code takes a perceivable amount of time. We've tried to mitigate this with bundling and v8-compile-cache. I'm hopeful and looking forward to snapshots.
- Memory use.
- Again, when invoking a tool thousands of times during a build, this a huge issue. One recent internal benchmark compared a Node tool against an equivalent port in another language, and found the the memory use at 83MiB vs 4MiB. That level of magnitude is really hard to argue against.
- Standard library.
- I agree with keeping core small, but if you're going to ship an
fs
module, then it should be more complete. Every one of our packages hasmkdirp
,rimraf
,shell-escape
, andtemp
- but I'd argue that these should be in core. - For small tools, it's hard sell to use Node, when another language/runtime has everything you need out-of-the-box.
- I agree with keeping core small, but if you're going to ship an
@zertosh great feedback, thanks. "Node.js CLIs at scale" brings up some pain points I hadn't heard before.
@cb1kenobi Thanks! I was wondering if you can explain this further:
A lighter-weight sandbox construct. vm.createContext() is perfect for 100% isolation, but I want a jail. If the code in the sandbox throws a TypeError, I can catch it outside, but I can't do an instanceof TypeError because the built-in types are not the same definition.
I'm not sure what you mean by "jail" or how what you want is different than this:
try {
vm.runInNewContext('throw new TypeError()', global);
} catch (e) {
console.log(e instanceof TypeError); // true
}
@boneskull When I did it, I didn't use global
. My use case was to run untrusted code in a sandbox/jail where the code couldn't modify global
or do things like process.exit()
.
I tried defining my own global object, but I found myself passing globals in such as TypeError
, RangeError
, etc so that my instanceof
checks worked. This turned out to be a huge pain and that vm2 had a much more elegant solution.
vm2 wraps everything in a proxy and wires up this contextify/decontextify thing that tries to convert objects across context like this https://github.com/patriksimek/vm2/blob/master/lib/contextify.js#L216-L232. Aside from the performance issues, vm2 suffers from many issues in which I finally gave up when I ran into patriksimek/vm2#62.
I ended up spawning a Node script that runs the untrusted code in a subprocess and I watch for the process exiting abnormally and stdout/stderr. Turns out this works better for our use case because we can run the code with a specific Node.js version.
I have other use cases for a sandbox. My app supports .js
config files (default export is the config) and it would be nice to load them in a sandbox and have it export the config.
@cb1kenobi Ah, yeah, I understand now. I've gone down the same path to vm2 myself. π
Describe how your Tooling leverages Node.js.
Why do you use Node.js for Tooling?
What's working in the Node.js tooling ecosystem?
ember-cli is implemented in node, this enables us to:
- share developer mindshare between our consumers (often JS developers) and our tooling
- utilize tools such as Babel for client side code (and it's various TC39 experiments)
- relatively easy and portable installation story (compared to ruby, or etc..)
- gives us the knowledge/power of V8/TC39 and the various Node TC's in our foundation. (each upgrade gives us cool stuff, like async/await etc)
- LTS is great, we aligned ours with nodes. This makes rollout at companies simple and easy.
What isn't working in the Node.js tooling ecosystem?
- Parallelism, Threads: Although we achieve parallelism in some build steps (such as babel/uglify etc) via spawned subprocesses, this is relatively costly and brittle experience. I hope the node worker or similar efforts will help us here. Specifically, lightweight parallelism (with debuggability)would be quite empowering
- CaptureExit: Coordination between multiple libraries wanting to do on-process shutdown work, is fraught with peril: context work around library
- Native extensions: Although quite powerful, we essentially cannot rely on them for our use-case. As when we have relied on them, issues with python/node-gyp or recompiling between node version switches, swamped our support and hurt our getting started story.
- npm: consistent issue preventing ember-cli + node 8 rollout (broke CITGM's ember-cli scenario), NPM maintainer pushed work-around, that hides the error but doesn't address the issue, this was done without tests or deeper investigation. When concerns are raised, and deeper analysis provided, comments were simply deleted. (I am hoping My search skills are just poor, but when searching today I could not longer find...)
- file watching:
fs.watch
is ok for small projects, but for large ones persistent systems such as watchman are required. - CLI startup time: requiring files is costly, especially if you CLI has an add-on system.
What's new that could impact Node.js?
@stefanpenner Thanks. I've also had struggles with exit hooks. Core could provide a better experience here.
(Discussion around npm is out-of-scope for us, however.)
A proper async module loader seems like it could improve startup time of CLIs (I'm not current on what's happening w/ Modules, unfortunately).
Thx @stefanpenner for chiming in!
Hi there, Angular CLI lead here.
Describe how your Tooling leverages Node.js.
Angular CLI is built using TypeScript and runs using Node. It's a CLI tool to create, manage and deploy Angular applications and libraries. We do not have an addon system (yet), but we do use libraries for workflows (Schematics) and tasks (Architect) which allow extension points from the user. These are dynamically loaded on a need basis.
Why do you use Node.js for Tooling?
Reusability of knowledge between framework developers and tooling developers. Easy to setup on Mac and Linux. The community is vast and eager to help, which is always nice.
What's working in the Node.js tooling ecosystem?
Debugging story, LTS, release cycle (clear and to the point), respect for semver.
What isn't working in the Node.js tooling ecosystem?
- I have a few gripes about
npm
itself;- never contacted library authors about the
npm audit
changes, resulting in panic from a lot of people before we even had a chance to update and react appropriately. We're still not fully updated yet, and have issues filed regularly because some dependency five level down have a regex with backtrace and.*
. - On a similar track; it is impossible to replace (or remove) a deprecated, obsolete or invalid indirect dependency on install. Optional indirect dependencies can also lead to user errors who think something went wrong and didn't. Allowing this means we could override versions of indirect dependencies that don't pass audit with versions that do, without having to wait for a release of every other packages down the line. This can be very hard to work with sometimes.
- lack of
optionalPeerDependencies
(or a similar mechanism) that which leads to a lot of warnings even if some dependencies could be provided by the project. I understand that this is akin to not having a dependency at all, but there's no way to communicate to projects that "we support this dep". - lack of a good (or any, really) API for calling npm from Node. We do npm installs a lot (on new projects, on updates, on adding dependencies, etc) and always end up spawning an npm process using the command line, which IMO is an antipattern.
- I would like to have an official supported JSON Schema for
package.json
. Something we could validate against would be nice. The ones I found were neither official nor complete (closest is probably http://json.schemastore.org/package).
- never contacted library authors about the
- Similar to Stefan's complaint, I'd like to have a proper native fs watch API. Depending on chokidar led to some inconsistencies over minor versions, licensing issues, and npm audit problems.
- A better install story on Windows, but I don't know how easy that is to setup. We have many newcomers from Windows who are used to click-and-get-started, but with Node+NPM it takes at least a few hours to explain command line stuff and how to use a command line.
- Node async API are still entirely callback based (my search skill might be poor but I don't think this is in anyone's todo list), which makes promise chaining and async/await a bit hard to use (natively).
- Streams-to-Observable libraries helped us significantly to make sense of Node's pipes, I would like to have a native solution for pipe APIs (that is typed properly and isn't string based).
- No way to resolve a package globally, hard to discover which is the global node modules.
- Using Bazel (which makes extensive usage of symlinks), we found in general that Node was confusing and really hard to work with because of symlinks. When using
require()
with a local path it doesn't check whether the script was symlinked or not. Using--preserve-symlinks
work for some use cases, but then breaks some libraries. We're working with the libraries that broke to resolve it. - Custom module resolvers could be easier to extend for Node. Overloading methods that start with an underscore is a bit awkward, but maybe I'm misunderstanding how to do it.
What's new that could impact Node.js?
Native support for import
(both dynamic and static), better startup time. Better typing for Streams (might be more a complaint about the poor stream typing in typescript package).
I think in general Node itself is in pretty good shape.
A proper async module loader seems like it could improve startup time of CLIs (I'm not current on what's happening w/ Modules, unfortunately).
@boneskull we use: time-require
to debug these issues, but its a fairly brittle/easy to regress model. Strategies to reduce disk IO during startup for CLI's would likely be handy. I can imagine several approaches, all possible but require some someones time.
We're now starting a user feedback session. Join us at https://zoom.us/j/168395218 or just watch at https://youtube.com/c/nodejs+foundation/live
@hansl @stefanpenner @zertosh please join!
I understand the desire for fs
to be minimal, but if it's going to support creation and removal of objects, it should also support nested creation and removal. Admittedly, these are not operations the underlying C/C++ API perform, but Node is a higher level and should provide at least some extra benefit in this area.
@boneskull oh man, I wish I would have known. I would have attended, any future sessions planned?
@stefanpenner yes but no dates yet. it will not be the same time slot.
I would like to add an additional FS related area of improvement: Directory/File Traversal.
- efficiency of
fs.readdirSync
+fs.statSync
vs python (or rusts) os.scandir() - confusion around globbing: glob/micromatch/minimatch bug or feature inconsistencies between versions and libraries result in confusing user experiences.
With 1. user-land could implement even more efficient file system traversal and globing modules. Improvements to walk-sync would be palatable
With 2. we as tooling authors could provide a better more consistent experience
For build tools, utilizing parallelization to make full use of available hardware if is important, today we use. child_process, tomorrow we will hopefully use worker_threads. But given these two solutions, there exist some pain points.
- debuggability: This may be addressed via inspector support? But improvements to
node debug
also seem prudent. - cost of loading code (such as babel) in each subprocess is quite high, I wonder if a cache (especially for worker_threads) could be used to avoid some of this cost.
child_process.exec short-commings. Today, I have found myself porting many modules to simply use execa over child_process
as it addresses
common bugs. Although it implements more the following are the areas I want to draw attention to.
- Cleanup when parent process dies. Accidentally leaked processes is the norm in node, an API that makes cleanup default would simply address this.
- Smooths over some windows quirks: summary
- returns promises
@boneskull keep me posted, I will make time (as long as I know ahead of time)
I watched the youtube live stream and I agree with most of the issues brought up.
Basically, bring these into core fs
:
- mkdirp
- rimraf
- graceful-fs?
Also, os.arch()
and process.arch
are misleading because it gives you the architecture that Node was built with, not the architecture of the user's OS.
Although it seems like 32bit is slowly going away?
Notes here: #70
I just thought of something. We use Babel to transpile and we embed the source maps at the end of each file in the //# sourceMappingURL=...
. Node 8 (not sure about 10) does not process the source maps, thus the line numbers in the stack traces aren't accurate.
Today we have to use source-map-support
and call register()
to patch Error.prepareStackTrace()
. That's ok, but source-map-support
eats up 940KB of disk space. It would be nice if Node had built-in source map support, then CLI's with transpiled code could be smaller and that's a win and perhaps this would be useful for the inspector too.
itβll be generally hard to argue that the savings of 940KB is worth the maintenance of taking that on in core. are we talking about resource-constrained systems? poor internet service? is there another good reason?
I was going through my packages trying to slim them down and nearly every one of them depends on source-map-support
. With package hoisting, it's not a super big deal. My main project still has 4 different versions of source-map-support
.
Being that source maps will probably never go away, I don't think it's a horrible idea if there was first class support for them, though I'm fine with leaving this up to a community module. With TypeScript and Babel being relevant these days, perhaps others would like it if Node could take on the burden of source maps.
As I'm poking around my various dependencies, it seems the larger packages are those where the author bundled their compiled/transpiled source and the original source. This seems to be especially the case for TypeScript based packages. Some packages even bundle tests. I'm sure some people appreciate the original source and tests being bundled with the package, but I prefer my dependencies as lightweight as possible.