nodejs/node-eps

Improving support for symlinks in the module system

jasnell opened this issue ยท 88 comments

Continuation of the discussions nodejs/node#10132 and nodejs/node#10107 ... Please use this thread to discuss the high level architectural, merits, and long term impact of the proposed change so that the PR discussion can be focused on the actual code review.

@phestermcs Please stop trying to change the process. I'm literally in the middle of writing a response to the issues you brought up in the PR right now. Let's do things the way that the Node project does things, respecting the established process, and try to follow the lead set out by the project maintainers.

I'm attempting to offer solutions for the following when using symlinks:

  • Memory Bloat
  • Addon Crashing
  • Tooling issues related to "main.js" file symlinks causing the __dirname to be the realpath as opposed to the link's target (where there may by symlink dirs in the target path the must be preserved)
  • Using Machine Level stores while keep dependency version resolution coupled to a given /node_modules root.

Pinging @bmeck. We should consider the impending ESM support and any impact module system changes might have with that in mind.

Re nodejs/node#10132 (comment)

@isaacs You're link to a proposed solution seems to be addressing peer dependencies, and is not clear as to how it would address the issue you state it would.

I see. So, to make sure we're on the same page, let me try to describe the problem, and y'all can tell me if I'm missing anything.

Suppose two apps with the same logical dependency requirements:

one
โ”œโ”€โ”€ baz@1.2.5
โ””โ”€โ”€ foo@1.2.3
two
โ”œโ”€โ”€ baz@1.2.5
โ””โ”€โ”€ foo@1.2.3

Let's further stipulate that foo depends on bar@1.x and baz depends on bar@9.x.

foo@1.2.3
โ””โ”€โ”€ bar@1.x
baz@1.2.5
โ””โ”€โ”€ bar@9.x

We install in one, and get this logical result:

one
โ”œโ”€โ”€ baz@1.2.5
โ”‚   โ””โ”€โ”€ bar@9.0.0
โ””โ”€โ”€ foo@1.2.3
    โ””โ”€โ”€ bar@1.2.5

Some time later, after some updates have been published to bar, we install in two and get this result instead:

two
โ”œโ”€โ”€ baz@1.2.5
โ”‚   โ””โ”€โ”€ bar@9.8.7
โ””โ”€โ”€ foo@1.2.3
    โ””โ”€โ”€ bar@1.2.5

If we use an approach like pnpm and ied, where every instance of foo@1.2.3 and baz@1.2.5 are symlinked into a shared store, and in that shared store, they each have their deps linked into their appropriate node_modules folders, then when we install two, it performs some unexpected action at a distance, updating one's meta-dependency because it updates baz@1.2.5 in the module store.

So this arrangement would thus be impossible:

one
โ”œโ”€โ”€ baz@1.2.5
โ”‚   โ””โ”€โ”€ bar@9.0.0  <-- because this...
โ””โ”€โ”€ foo@1.2.3
    โ””โ”€โ”€ bar@1.2.5
two
โ”œโ”€โ”€ baz@1.2.5
โ”‚   โ””โ”€โ”€ bar@9.8.7 <-- conflicts with this
โ””โ”€โ”€ foo@1.2.3
    โ””โ”€โ”€ bar@1.2.5

This is of course trivial to do without symlinking into a store, because you just have multiple copies of stuff. This is the "disk is cheap" approach. It works pretty well today, because disk is pretty cheap, but it does incur some less obvious costs like building packages, downloading multiple times, etc. There are some other ways to mitigate those costs, of course, and they should be explored separately.

However, in this challenge, we want to say that there's exactly one copy on disk of baz@1.2.5 package contents. The "disk is expensive" model.

Furthermore, in any satisfying solution, (A) there must be a logically correct dependency graph loaded at run time (ie, everything gets a copy of what it depends on, meaning that there are 2 copies of bar in the runtime no matter what, to resolve the dependency conflict), and (B) the two apps have different versions of bar@9.x served to their respective (linked) copies of baz. (If we just did the "resolve deps based on symlink destination as well as target" approach, we'd have a problem because we can't put both versions of baz in one/node_modules or two/node_modules.)

If I'm not understanding the nature of the logical puzzle, please let's stop and clarify. I don't want to rush to solutions until the problem is very well understood.

bmeck commented

No known problems with ESM on this front. If we were talking archive formats that serialize symlinks like ASAR / NODA / WebPackage we might have things to think about, but none are currently supported by Node.

Now that we have established the technical challenge and agreed on the terms of it, I'd like to seek alignment on the heuristics to be used in evaluating potential solutions.

Here's a sketch of a list of what's important to me, personally, in ranked priority order. This is a question of values, and it's necessarily somewhat subjective, so I'd love to get some input from others in the nodejs core maintainer group about what the project's priorities are.

  1. It must be backwards compatible with current node, npm, ied, and pnpm use cases. If the only way to fix this problem is to break our current ecosystem, then we just have to live with the problem. We must not up-end our community, no matter what.
  2. It must satisfy the conditions of the challenge. (Of course, or else it's not a solution.)
  3. It must add a minimum of new surface area to the module loading architecture. (If there is a solution that requires no change to the node module loader, then that would be ideal here, but that might not be possible. Less change is better, in general.)
  4. It must not cause a significant slowdown in the Node.js startup time. (For example, searching 3 extra module locations is almost certainly fine. 300 would likely be a problem.)
  5. It should ask a minimum of additional complexity on the part of module architecture schemes seeking to solve this sort of problem. This is the lowest priority, because once we show that it is possible, the implementation is just a matter of typing.

Does this seem like a fair set of heuristics for judging solutions? I realize that these will be somewhat in conflict with one another, but that's why they call it a trade-off :)

If it seems that I'm being overly methodical and pedantic, it is because any change to the module system demands extreme diligence. Last time there was a disruption here, I promised that I would do everything in my power to help avoid it in the future.

Or is it your intent that this exceptional change not be handled in this exceptional way?

Hah, my intent is that we make any change as un-exceptional as possible ;)

Regarding the condition of the challenge, I think it should also include a way out of symdir cycles when representing module dependency cycles.

@phestermcs Can you elaborate on what you mean by "a way out of symdir cycles"?

This is an example where the modules have a cycle, symlinks to those modules are still being used, but themselves dont create a directory cycle, which can cause infinite loops in some forms of tools.

So, is it a case like this?

A -> B
B -> C
C -> A

So, my app depends on A, A depends on B, B depends on C, and C depends on A.

You want to avoid situations where I can end up with a path like myapp/node_modules/A/node_modules/B/node_modules/C/node_modules/A/... because that makes some other file-viewing tool like a text editor or build script or whatever freak out.

Am I understanding it properly? If so, that seems like a fair restriction to me.

Correct. Fwiw, Sticking with subordinate /node_modules would prevent being able to meet this constraint.

Yes, sticking with the somewhat naive "link everything into the store" approach, where things in the store have a node_modules folder with other links to things in the store, fails this approach and also fails the other requirements of a solution.

@phestermcs I mean you have a folder structure like this:

โ”œโ”€โ”€ app
โ”‚   โ””โ”€โ”€ 2.2.2
โ”‚       โ”œโ”€โ”€ index.js
โ”‚       โ””โ”€โ”€ node_modules
โ”‚           โ”œโ”€โ”€ baz -> ../../../baz/1.2.5
โ”‚           โ””โ”€โ”€ foo -> ../../../foo/1.2.3
โ”œโ”€โ”€ bar
โ”‚   โ”œโ”€โ”€ 2.3.4
โ”‚   โ”‚   โ””โ”€โ”€ index.js
โ”‚   โ””โ”€โ”€ 9.8.7
โ”‚       โ””โ”€โ”€ index.js
โ”œโ”€โ”€ baz
โ”‚   โ””โ”€โ”€ 1.2.5
โ”‚       โ”œโ”€โ”€ index.js
โ”‚       โ””โ”€โ”€ node_modules
โ”‚           โ””โ”€โ”€ bar -> ../../../bar/9.8.7
โ””โ”€โ”€ foo
    โ””โ”€โ”€ 1.2.3
        โ”œโ”€โ”€ index.js
        โ””โ”€โ”€ node_modules
            โ””โ”€โ”€ bar -> ../../../bar/2.3.4

That won't work because updating a second app updates the bar module under foo's node_modules folder in the shared store, and also if bar depends on foo, you get a symlink cycle.

This is the naive "use symlinks into a central store" approach, but it falls down on 2 points. (I'm using "naive" as a compliment here, meaning "do the simplest, most obvious, least clever thing".)

@isaacs Ok, I thought that's what you may have meant; just double checking.

But that does remind me of still another "meet the challenge" point. There also is the case of handling "bundled" node_modules as well. i.e. It should still be possible to have a module in a machine store that does have a '/node_modules' subfolder but all of it's content is 'locked' to that module (no symlinks to outside of module), most simply as copies. For example npm itself ships with all its dependencies.

@phestermcs Please let's keep the scope of this problem and solution limited, or else we'll never be able to make progress in reasonable time. Saying that any solution is backwards compatible means that any existing solutions to those problems (for example, just copying files into folders) will continue to work just fine.

Yes, backwards compatibility means that current apps work with the future node. Forwards compatibility, where future apps work with current node, is much harder to attain for any semantic change or bug fix. Presumably the entire point is that there's a problem we want to fix or something we want to do that is not currently possible.

Here's another proposal possibility that might work.

  1. Track both the requested path (where the module was found) and the realpath (where the module actually is) on the module object.
  2. Caches remain as they currently are (using realpath as a key)
  3. node_module lookup path is the node_module directory set from the realpath and then the node_module directory set from the requested path.

I haven't tested it yet, but I believe that would solve the peer dependency issue addressed by --preserve-symlinks (similar to how adding the cwd-based node_module lookup paths would in the proposals in isaacs/node6-module-system-change), and also open the door for two module-store symlink based approaches that could satisfy the necessary criteria listed above.

The first module store symlink approach is to create actual nested directory hierarchies, but symlink the contents of the packages into their destinations. This results in a layout on disk that is more parallel to the naive "disk is cheap" approach, and that parallel is appealing.

However, it necessarily means that the package manager has to keep a record of files in each package, and it's WAY more actual symbolic links. If packages are linked from their development location into the module store, then the developer will have to refresh all of their module installations whenever a file is added to the package. (Maybe that happens rarely? Or can be somehow detected/automated?) So, it doesn't score very highly on the 5th heuristic (minimizing package manager complexity).

That would look something like this in practice (where index.js can be many files, of course)

app1
โ”œโ”€โ”€ index.js
โ””โ”€โ”€ node_modules
    โ”œโ”€โ”€ baz
    โ”‚   โ”œโ”€โ”€ index.js -> ../../../.store/baz/1.2.5/index.js
    โ”‚   โ””โ”€โ”€ node_modules
    โ”‚       โ””โ”€โ”€ bar
    โ”‚           โ””โ”€โ”€ index.js -> ../../../../../.store/bar/2.3.4/index.js
    โ””โ”€โ”€ foo
        โ”œโ”€โ”€ index.js -> ../../../.store/foo/1.2.3/index.js
        โ””โ”€โ”€ node_modules
            โ””โ”€โ”€ bar
                โ””โ”€โ”€ index.js -> ../../../../../.store/bar/9.0.0/index.js

The second module store symlink approach is to lay files out on disk with a package.json file containing {"main":"content"} and a content folder that links into the store. This loses the parallelism with the naive copying approach, but it also means that package contents can be modified without a rebuild. It also adds an additional folder and some extra file system activity on startup, so the performance impact will have to be measured.

It looks like this:

app3
โ”œโ”€โ”€ index.js
โ””โ”€โ”€ node_modules
    โ”œโ”€โ”€ baz
    โ”‚   โ”œโ”€โ”€ contents -> ../../../.store/baz/1.2.5
    โ”‚   โ”œโ”€โ”€ node_modules
    โ”‚   โ”‚   โ””โ”€โ”€ bar
    โ”‚   โ”‚       โ”œโ”€โ”€ contents -> ../../../../../.store/bar/2.3.4
    โ”‚   โ”‚       โ””โ”€โ”€ package.json
    โ”‚   โ””โ”€โ”€ package.json
    โ””โ”€โ”€ foo
        โ”œโ”€โ”€ contents -> ../../../.store/foo/1.2.3
        โ”œโ”€โ”€ node_modules
        โ”‚   โ””โ”€โ”€ bar
        โ”‚       โ”œโ”€โ”€ contents -> ../../../../../.store/bar/9.0.0
        โ”‚       โ””โ”€โ”€ package.json
        โ””โ”€โ”€ package.json

I'll try to find some time to make a patch for this minimal module system change so that we can investigate it further and try to shake out some other tradeoffs.

Once there are at least 2 or 3 options, we can figure out which one satisfies the criteria the best, and with a minimum of new footguns.

@phestermcs Both module store approaches would depend on the same change to the module loader, described at the start of my post in 3 bullets. The existing module loader already knows what to do with the package.json file as described.

The only internal change to node would be that the module search paths include both the node_modules folders based on the realpath as well as the requested path.

If the symlink-folder will be always named contents, why should there be a package.json?

This structure would be enough, wouldn't it? package.json is in contents anyway...

app3
โ”œโ”€โ”€ index.js
โ””โ”€โ”€ node_modules
    โ”œโ”€โ”€ boo
    โ”‚   โ”œโ”€โ”€ contents -> ../../../.store/boo/1.0.0
    โ”‚   โ””โ”€โ”€ node_modules -> ../../../.store/boo/1.0.0/node_modules
    โ”œโ”€โ”€ baz
    โ”‚   โ”œโ”€โ”€ contents -> ../../../.store/baz/1.2.5
    โ”‚   โ””โ”€โ”€ node_modules
    โ”‚       โ””โ”€โ”€ bar
    โ”‚           โ””โ”€โ”€ contents -> ../../../../../.store/bar/2.3.4
    โ””โ”€โ”€ foo
        โ”œโ”€โ”€ contents -> ../../../.store/foo/1.2.3
        โ””โ”€โ”€ node_modules
            โ””โ”€โ”€ bar
                โ””โ”€โ”€ contents -> ../../../../../.store/bar/9.0.0

This solution will work when someone requires the package via the entry point. However, frequently people are using a different file from the package, like require('foo/bar'). If bar.js is not in the root of the package anymore, it won't be accessible the old way...

I wrote up a patch for what I'm suggesting: isaacs/node@d0f9a99

It isn't yet ready for a PR (not least because it needs tests, docs, and the like), and I'd like to drive this discussion towards a shared understanding of the problem and requirements of a solution. I only wrote this patch because sometimes it's better to communicate thoughts in code rather than prose :)

This solution will work when someone requires the package via the entry point. However, frequently people are using a different file from the package, like require('foo/bar'). If bar.js is not in the root of the package anymore, it won't be accessible the old way...

Indeed! That's a shortcoming of the contents folder approach! Symlinking package contents would still work, though.

Another unaddressed concern: loading deps from modules loaded within the dependent module.

For example, if we slightly alter my foo package in the examples above to look like this:

// foo/index.js
console.error('in foo', module.paths)
require('./other-module.js')
// foo/other-module.js
console.error('in foo/other-module.js', module.paths)
var bar = require('bar')
console.log('foo using %s', bar)

Then we get this result:

in foo [ '/Users/isaacs/dev/js/node6-module-system-change/x/.store/foo/1.2.3/node_modules',
  '/Users/isaacs/dev/js/node6-module-system-change/x/.store/foo/node_modules',
  '/Users/isaacs/dev/js/node6-module-system-change/x/.store/node_modules',
  '/Users/isaacs/dev/js/node6-module-system-change/x/node_modules',
  '/Users/isaacs/dev/js/node6-module-system-change/node_modules',
  '/Users/isaacs/dev/js/node_modules',
  '/Users/isaacs/dev/node_modules',
  '/Users/isaacs/node_modules',
  '/Users/node_modules',
  '/node_modules',
  '/Users/isaacs/dev/js/node6-module-system-change/x/app1/node_modules/foo/node_modules',
  '/Users/isaacs/dev/js/node6-module-system-change/x/app1/node_modules' ]
in foo/other-module.js [ '/Users/isaacs/dev/js/node6-module-system-change/x/.store/foo/1.2.3/node_modules',
  '/Users/isaacs/dev/js/node6-module-system-change/x/.store/foo/node_modules',
  '/Users/isaacs/dev/js/node6-module-system-change/x/.store/node_modules',
  '/Users/isaacs/dev/js/node6-module-system-change/x/node_modules',
  '/Users/isaacs/dev/js/node6-module-system-change/node_modules',
  '/Users/isaacs/dev/js/node_modules',
  '/Users/isaacs/dev/node_modules',
  '/Users/isaacs/node_modules',
  '/Users/node_modules',
  '/node_modules' ]
module.js:498
    throw err;
    ^

Error: Cannot find module 'bar'
    at Function.Module._resolveFilename (module.js:496:15)
    at Function.Module._load (module.js:443:25)
    at Module.require (module.js:529:17)
    at require (internal/module.js:20:19)
    at Object.<anonymous> (/Users/isaacs/dev/js/node6-module-system-change/x/.store/foo/1.2.3/other-module.js:2:11)
    at Module._compile (module.js:602:32)
    at Object.Module._extensions..js (module.js:611:10)
    at Module.load (module.js:519:32)
    at tryModuleLoad (module.js:473:12)
    at Function.Module._load (module.js:465:3)

This can be fixed by also passing along the parent module paths to its children modules. I'll add that in soon.

@phestermcs I must admit I don't understand what you're talking about with respect to main.js in your most recent comment. Can you elaborate with a minimal example?

Yeah, this fixes the issue: isaacs/node@f727658

in foo [ '/Users/isaacs/dev/js/node6-module-system-change/x/.store/foo/1.2.3/node_modules',
  '/Users/isaacs/dev/js/node6-module-system-change/x/.store/foo/node_modules',
  '/Users/isaacs/dev/js/node6-module-system-change/x/.store/node_modules',
  '/Users/isaacs/dev/js/node6-module-system-change/x/node_modules',
  '/Users/isaacs/dev/js/node6-module-system-change/node_modules',
  '/Users/isaacs/dev/js/node_modules',
  '/Users/isaacs/dev/node_modules',
  '/Users/isaacs/node_modules',
  '/Users/node_modules',
  '/node_modules',
  '/Users/isaacs/dev/js/node6-module-system-change/x/app1/node_modules/foo/node_modules',
  '/Users/isaacs/dev/js/node6-module-system-change/x/app1/node_modules' ]
in foo/other-module.js [ '/Users/isaacs/dev/js/node6-module-system-change/x/.store/foo/1.2.3/node_modules',
  '/Users/isaacs/dev/js/node6-module-system-change/x/.store/foo/node_modules',
  '/Users/isaacs/dev/js/node6-module-system-change/x/.store/node_modules',
  '/Users/isaacs/dev/js/node6-module-system-change/x/node_modules',
  '/Users/isaacs/dev/js/node6-module-system-change/node_modules',
  '/Users/isaacs/dev/js/node_modules',
  '/Users/isaacs/dev/node_modules',
  '/Users/isaacs/node_modules',
  '/Users/node_modules',
  '/node_modules',
  '/Users/isaacs/dev/js/node6-module-system-change/x/app1/node_modules/foo/node_modules',
  '/Users/isaacs/dev/js/node6-module-system-change/x/app1/node_modules' ]
foo using bar 9.0.0
baz using bar 2.3.4

EDIT: updated commit link after rebasing to node master

@isaacs

node must "experience" the /node_modules rooted symbolic directory structure just like a human would when viewing through a file system explorer and expanding the directory hierarchy. Converting "main.js"'s path all the way down to its absolute realpath on program launch, and using that as it's __dirname prevents this, and breaks tooling.

Effectively, all modules derive there __dirname from the __dirname of the parent module that first required them, and all require resolutions are relative to __dirname. The main modules __dirname, a function of "main.js"'s path, is the root, or start, of this.

Example

In practice the structure and how accessed (tooling launching tooling, etc) is more complex and has different patterns, so this is a contrived simplification to highlight the fundamental problem inherent to all of them.

/store
  /peer
    /v1
      index.js
  /mod
    /v1
      index.js

/projectA
  /bin
    mod        -> ../node_modules/mod/index.js
  /node_modules
    /peer      -> /store/peer/v1
    /mod       -> /store/mod/v1
      index.js                        // var = require('peer')

$> node /projectA/bin/mod
Exception: cant find module 'peer'

With current behavior of converting "mod"'s path to its realpath, and using as the main modules __dirname, the require('peer') attempts to resolve relative to the "real path" /store/mod/v1, when it should resolve from the "symbolic path" /projectA/node_modules/mod.

In your proposed solution, using a realpath in the search list will also have the same effect, just lower down in the link tree.

@phestermcs I'm trying to find solutions that do not add additional new API surface, including new folder paths that users might be confused by.

Adding module+node_modules as a folder to be searched at each level is an additional thing to document and explain. That is a cost. But some of these edge cases (not finding peer deps, resolving based on the real path only rather than linked path, etc.) do routinely come up as misunderstandings of how the current system works, so there's less cost in adding support for them.

And since it only adds support for something that didn't work at all before, and only at a lower priority than the current search paths, there's very little chance of someone's existing use case being disrupted.

Adding lookup paths will have a performance impact as well. We've done a lot of work recently to make all this faster and if we go the .esm router we're already taking another perf hit from lookups. I'm a little worried about the performance regression this could cause.

@phestermcs @mikeal Yes, I agree, that's a concern. Adding 2 or 3 additional folder lookup paths is probably fine. Adding 10 is less likely to be ok.

So, if we go from this:

[ '/Users/isaacs/dev/.store/foo/1.2.3/node_modules',
  '/Users/isaacs/dev/.store/foo/node_modules',
  '/Users/isaacs/dev/.store/node_modules',
  '/Users/isaacs/dev/node_modules',
  '/Users/isaacs/node_modules',
  '/Users/node_modules',
  '/node_modules' ]

to this:

[ '/Users/isaacs/dev/.store/foo/1.2.3/node_modules',
  '/Users/isaacs/dev/.store/foo/node_modules',
  '/Users/isaacs/dev/.store/node_modules',
  '/Users/isaacs/dev/node_modules',
  '/Users/isaacs/node_modules',
  '/Users/node_modules',
  '/node_modules',
  '/Users/isaacs/dev/app4/node_modules/foo/node_modules',
  '/Users/isaacs/dev/app4/node_modules' ]

then that feels a lot less risky to me than this:

[ '/Users/isaacs/dev/.store/foo/1.2.3/node_modules',
  '/Users/isaacs/dev/.store/foo/1.2.3+node_modules',
  '/Users/isaacs/dev/.store/foo/node_modules',
  '/Users/isaacs/dev/.store/foo+node_modules',
  '/Users/isaacs/dev/.store/node_modules',
  '/Users/isaacs/dev/.store+node_modules',
  '/Users/isaacs/dev/node_modules',
  '/Users/isaacs/dev+node_modules',
  '/Users/isaacs/node_modules',
  '/Users/isaacs+node_modules',
  '/Users/node_modules',
  '/Users+node_modules',
  '/node_modules',
  '/Users/isaacs/dev/app4/node_modules/foo/node_modules',
  '/Users/isaacs/dev/app4/node_modules/foo+node_modules',
  '/Users/isaacs/dev/app4/node_modules',
  '/Users/isaacs/dev/app4+node_modules' ]

@isaacs @mikeal I had this very same concern, but in normal use, most of the time only the first couple of paths actually get searched before somethings found, not all of them, and with the caching enhancements, the actual stat() cost only happens once

For now, don't think of my solution as 'the' solution, but as a fully working one.. a baseline.. a bar to meet. a way to actually experience any solution.

I think it makes sense to lookup for the adjacent node_modules only one directory up. In that case it won't be +10 additional directory lookups but only +1.

[ '/Users/isaacs/dev/.store/foo/1.2.3/node_modules',
  '/Users/isaacs/dev/.store/foo/1.2.3+node_modules',
  '/Users/isaacs/dev/.store/foo/node_modules',
  '/Users/isaacs/dev/.store/node_modules',
  '/Users/isaacs/dev/node_modules',
  '/Users/isaacs/node_modules',
  '/Users/node_modules',
  '/node_modules',
  '/Users/isaacs/dev/app4/node_modules/foo/node_modules',
  '/Users/isaacs/dev/app4/node_modules']

although my example is not correct, because I looked up in the store

@zkochan when a root tree would be deployed using anm there are no more /node_modules other than the root, they're now all +node_modules. and mechanically speaking resolution works just like /node_modules

I see, but they will always be one level up, right? There is no need to check in each parent directory till the root of the disk. Hence, it won't add more lookups. I think the lookup can be even stopped if there's an adjacent node_modules dir.

@zkochan whoever implemented the stat() caching did a great job, because stat()ing all the time is what would kill performance. But once a path has been stat()'d, it never is again, no matter how many more times requires happen against it and its parents.

@phestermcs It is very difficult to keep up with this conversation and address everything you bring up when you post updates every 10 minutes. That is why I've been trying to block out a bit of time to identify some issue, thoroughly examine it, sleep on it for a few days, write an example implementation, and post a fully digested update.

When you say things like "Thats means, the anm solution is ONE LINE change" or "Does it even matter to anyone here that I ran citgm and results were identical? and that citgm itself was running with the altered version of node?", I interpret this to be somewhat demanding and frustrated. It's possible, since text is what it is, that I'm misinterpreting your intent, and so I am attempting to give you this feedback only as an expression of how your message is being received, in the hopes that you might benefit from this knowledge.

If I'm not misinterpreting your urgency or agitation, then I must again insist that you please show restraint and respect for the process. This will take a long time. We will examine every angle very thoroughly. Demanding that we skip this process this will not make it go faster. Broadcasting attachment to a specific outcome will not guarantee that outcome. The process is here for a reason.

Node is no one's pet project. There are millions of people depending on its stability. Please be at peace in this space.

@phestermcs Just to be clear, you didn't piss me off, and I don't see anyone else suggesting that they're pissed off at you. I am merely pointing out that the urgency and frustration you are bringing to this issue is not helping drive us towards a resolution. Additionally, using superlatives about a specific solution also makes it harder to do this work. The task before us is to very clearly identify the problem to be solved, and explore the solution space.

I hate to keep repeating it, so this is the last time I'll say this: it is important to be excessively careful about any changes to node's module system. Frustration and urgency are the opposite of the ideal approach.

I'm not that smart, and I only have a limited slice of my attention to spend on this, but what I do offer is a wealth of experience and institutional memory about how the module system is designed, and why. I ask that you be patient with me. It's only been a few days since I started looking at this. We will all benefit if we can do this very carefully and explicitly, so that there are multiple options on the table, and the problem space is mapped out in excessive detail.

bmeck commented

@phestermcs Core contributors are very careful with things that affect the Module system. Increased support for symlinks would be great! However, part of the problem is the difficulty in knowing how the packages in the wild on the public registry rely upon certain features and/or constraints of the module system. I can speak from brutal experience; understanding the full surface of what would break by using ESM took me over 1 year. There will be some breakage I did not plan for even after over a year of investigation.

As such, I would not rush to give a set of tests that need to pass. Going out on the public registry, github, etc. and searching for breakage takes time. Understanding path expectations is even more confusing than how module.exports/require is used since path resolution is mostly transparent due to it being on the fs and not visible in source code.

This enhancement will most likely take a long time. People will incrementally rebuild your solution (or different ones) to understand the edge cases of the problem space themselves and present minor variations from their explorations. Watching this will take a lot of patience. They will get tripped up on things that may already be known to need workarounds/solutions, but they may discover new things.

I cannot give you a list of expectations on speed or clarity in specifying the requirements to land this. Large or ecosystem affecting PRs take long times; for example, the Workers PR is over a year old. We will get through this, slow and steady. I agree that we should setup a call in the near future, and I can try an email as a means to organize a hangout.

bmeck commented

CC: @groundwater , is there an Electron person who can read through this and see if symlink paths affect things at all (it may not)?

@phestermcs ... I understand that this issue is important to you. I also understand that you have a strong desire to contribute meaningfully to both to this issue and to Node.js in general. Those contributions and your efforts are definitely appreciated. However, please do keep in mind that how thoughts and ideas are communicated can be disruptive to achieving your goal. Frequent long messages, calling out individual contributors with big bold text (that comes across as shouting on here), and just the overall general tone of your posts are contributing to what many participants and observers in this conversation are interpreting as counter-productive (at best) and harassing (at worst) behavior. I do want this issue to be resolved, but if I may suggest, please take a bit more care in how you are engaging here and please pause before posting to consider how your posts may be coming across to others. It will likely be far easier to make progress if you take a significantly less confrontational and urgent approach in your messages.

My apologies @isaacs. In humility ask you and other for fogiveness.

@phestermcs Forgiven and forgotten. I am eager to move on.

So, I'm confused by something.

Let's say you have a module at /store/foo/index.js.

Then, that module does require('./lib/thing.js'), loading from /store/foo/lib/thing.js.

Then, in /store/foo/lib/thing.js, it does require('bar') to pull in foo's dependency, "bar".

I then symlink the foo module in to /app/node_modules/foo.

When lib/thing.js does require('bar') what paths get searched?

Today, it would be these:

/store/foo/lib/node_modules
/store/foo/node_modules
/store/node_modules
/node_modules

In the "add node_module folders for request path, as well as parent module" proposal I have on my ep-46 branch, it'd search in these:

/store/foo/lib/node_modules
/store/foo/node_modules
/store/node_modules
/node_modules
/app/node_modules

I thought I had wrapped my head around the search paths that would be used with anm, but your reaction to my comment above about adding 10 new search paths left me confused. What exactly are you suggesting we add to the search path set in this case?

Because it can't just be __dirname + '+node_modules' or else the thing.js will look in /store/foo/lib+node_modules first, which doesn't seem very worthwhile. And if it's only based on the main module, then that means that it's going to search in /app+node_modules, which also doesn't seem right.

What are "adjacent node_modules" adjacent to, exactly?

Thanks @isaacs for your graciousness; it's warmly received.

First, allow me to just clarify the basic idea behind adjacent+node_modules. It's fundamentally identical to /node_modules which I call subordinate/node_modules as in that case the node_modules directory is "underneath" it's module. "adjacent" just means the node_modules directory for the module is "next to" its module's directory rather than underneath it. Its really just the exact some thing, but using the + instead of the /, which then decouples it from the module directory itself. Once it 'clicks' for you I think you'll appreciate its simplicity, and how its just as elegant as, and in the spirit of /node_modules with respect to handling version resolution, because it's pretty much the same thing

/app
  /node_modules
    /depA/node_modules
    /depA+node_modules
    /depB/node_modules
    /depB+node_modules

Now, for how things should ideally resolve in your example, the first thing is that, and this is related to the "main.js" fix, /store/foo would not be the __dirname used to resolve anything, assuming it had been symlinked. From your example, I'm guessing this is how the layout is:

/store
  /foo
    index.js
    /lib
      thing.js

/app
  index.js
  /node_modules
    /foo          -> /store/foo

launching with:
node /app

the main module's __dirname is set to /app

then:

app/index.js
    require('foo')

the search paths would be:

/app/node_modules
/app+node_modules
/node_modules

when foo is loaded, its __dirname is set to /app/node_modules/foo,
so for:

foo/index.js
  require('./lib/thing.js')

the search path would be relative to foo's __dirname (because of the leading ./):

/app/node_modules/foo

so then thing's __dirname is set to /app/node_modules/foo/lib

so

thing.js
  require('bar')

search paths:

/app/node_modules/foo/lib/node_modules
/app/node_modules/foo/lib+node_modules
/app/node_modules/foo/node_modules
/app/node_modules/foo+node_modules
/app/node_modules
/app+node_modules
/node_modules

Does that make sense?

I should mention, I used your example, which symlinked under /node_modules, so the detail I showed you is pretty much describing backwards compatibility; i.e. +node_modules are in the search list, but they don't exist, so nothing changes

Ok. So what happens if we symlink /app/bin.js to /usr/local/bin/app? The main purpose of resolving symlinks to their realpath for module resolution is that linked files typically need to find their modules based on their actual location on disk.

/use/local/bin/app is a file symlink and /app/bin.js is its target. And that is exactly why when "main.js" is a file-symlink, we simply use it's target path to set the main modules __dirname (that's very, very, very different than converting the file symlink all the way down to it's realpath), but in all other cases we just preserve the entire path.

Keep in mind the target of the link can also contain symdirs in its path, and we want to preserve those.

If the target is a relative path, than we want to resolve it relative to the file symlinks path.dirname(), which may also have symdirs in it's path, which will also want to preserve.

This sounds a little more involved than it actually is, but it always keeps everything correctly rooted in a symbolic path space, so all descendant requires resolve intuitively.

I believe because "main.js" is currently always converted to its realpath, it's the fundamental reason so much tooling broke. I would consider this a bug.

So, if I'm understanding your proposal correctly, then:

  1. require.main is not resolved to a full realpath, but only the first level if it is a symbolic link, and only once. (For example, if /usr/bin/app -> /app/bin.js and /app -> /home/dev/app, or some other chain of symbolic links, it won't get fully resolved.)
  2. Modules are not realpath'ed (or symlink-resolved at all) from there on.
  3. When a module is found, its node_modules hierarchy includes <path>/node_modules as well as <path>+node_modules for all the elements in its path, starting with the path.dirname() of where it is found.

It seems to me then that this breaks the current behavior in at least the following cases (which may or may not be resolvable):

  1. A file symlink that is not the main module. For example, if /store/foo/lib/blerg.js is a symlink to /store/bar/blerg.js, then it won't find modules in /store/bar/node_modules. (This happens more often in the wild than you might expect. For example, scripts that do require(which.sync('some-module-exe')) to load the module's executable.)
  2. A file symlink that is a link to a symlink, even as the main module. For example, /usr/local/bin/app -> /store/app/bin and /store/app/bin -> /store/app/command-line/index.js. In this case, /store/app/command-line/index.js will not find modules in /store/app/command-line/node_modules.
  3. Since anm only looks one symlink deep for the main module, and doesn't realpath the whole way, it doesn't include any node_modules folders that are found relative to the realpath but not relative to the first-level link path. For example, /projects/js links to /home/isaacs/projects/js. /bin/app links to /projects/js/app/bin.js. /home/isaacs/node_modules won't be in the module search path.

Please let me know if these deviations from the current contract represent a misunderstanding on my part. Thank you.

  1. require.main is not resolved to a full realpath, but only the first level if it is a symbolic link, and only once. (For example, if /usr/bin/app -> /app/bin.js and /app -> /home/dev/app, or some other chain of symbolic links, it won't get fully resolved.)

..and only if /usr/bin/app were a file-symlink. /symdir/modir/ and the like are left alone.

  1. Modules are not realpath'ed (or symlink-resolved at all) from there on.

..which is exactly how --preserve-symlinks works today.

  1. When a module is found, its node_modules hierarchy includes <path>/node_modules as well as <path>+node_modules for all the elements in its path, starting with the path.dirname() of where it is found.

Correct. However, I consider this the anm enhancement, and an entirely separate thing from 1 & 2, which are just about working with symlinks generally. anm has no idea if symlinks are involved or not, it just makes using them to machine stores possible. They really should be seen is two discreet things.

Re breaking current behavior

  1. A file symlink that is not the main module. For example, if /store/foo/lib/blerg.js is a symlink to /store/bar/blerg.js, then it won't find modules in /store/bar/node_modules. (This happens more often in the wild than you might expect. For example, scripts that do require(which.sync('some-module-exe')) to load the module's executable.)

How require(which.sync('some-module-exe')) would behave is exactly as it does today with --preserve-symlinks. The handling of a file-symlink is only for "main.js", not for other require() calls. As an aside, in your example the path passed to require would be an absolute path I believe.

  1. A file symlink that is a link to a symlink, even as the main module. For example, /usr/local/bin/app -> /store/app/bin and /store/app/bin -> /store/app/command-line/index.js. In this case, /store/app/command-line/index.js will not find modules in /store/app/command-line/node_modules.

For handling "main.js", that is correct. I have not encountered that pattern, but I believe file-symlinks to file-symlinks, however many levels, could be simply handled by just looping the "main.js" logic until a non file-symlink is hit, which would still be preserving any symdirs along the way.

  1. Since anm only looks one symlink deep for the main module, and doesn't realpath the whole way, it doesn't include any node_modules folders that are found relative to the realpath but not relative to the first-level link path. For example, /projects/js links to /home/isaacs/projects/js. /bin/app links to /projects/js/app/bin.js. /home/isaacs/node_modules won't be in the module search path.

anm has nothing directly to do with symlinks, it just makes using them to machine stores possible.

Your example with additions to highlight use case as to why that might not be preferred.

/home
  /isaacs
    /node_modules
      /bar               // @ 3.0.0
    /project
      /js
        /app
          bin.js         // @ 1.0.0

/projectA
  /node_modules
    /bar                 // @ 1.0.0
  /js                    -> /home/isaacs/projects/js
  yarn.lock
    "app" @ 1.0.0
    "bar" @ 1.0.0

/projectB
  /node_modules
    /bar                 // @ 2.0.0
  /js                    -> /home/isaacs/projects/js
  yarn.lock
    "app" @ 1.0.0
    "bar" @ 2.0.0

/bin/appA                -> /projectA/js/app/bin.js
/bin/appB                -> /projectB/js/app/bin.js

You are correct. However, this "feels" similar to the scenario of concurrent development of interdependent modules on the same machine, and in that case it might be preferred ancestor dependency version resolutions stay with the project. But yes, you are correct.

Fwiw, Just on principle, always preserving the symbolic space will allow the user (even if via tooling) to fundamentally always be in full control of exactly how they want to organize their directory structure. When converting to realpaths, that control is taken away.

No known problems with ESM on this front. If we were talking archive formats that serialize symlinks like ASAR / NODA / WebPackage we might have things to think about, but none are currently supported by Node.

๐Ÿ‘‹, Electron person here, trying to understand the book worth of text that's written about this - fwiw, ASAR doesn't support symlinks at all, so we're good on that front. From our perspective, this probably won't affect anything (i.e. we're pretty much 100% compatible with node in this regard).

Thanks for CC'ing someone from Electron into the thread!

@isaacs You described certain symlinking techniques which I'm sure might exist. I'm just curious as to the use cases they are supporting, and just generally the frequency of them, and how the disparity between nix and win regarding use of file-symlinking impacts them? With respect to preserving symlinks allowing the user to remain in control of their directory structure (and by consequence how dependencies resolve), I reason those use cases could still be supported by some minor organizational changes, should the change in behavior break them.

It seemed some of your concerns revolved around those situations leveraging the fact node searches for /node_modules directories all the way to the root, as an approach to sharing common modules between applications, or "main" entry points. It appears the trend, as evidenced by npm's shrink-wrapping, yarn's lock files, and ied's CAS model, is to fully isolate version resolution to "main" entry points so behavior, with respect to the precise versions of modules being used, can always be deterministic (i.e. all resolutions resolve from "mains" companion /node_modules directory). Relying on a common ancestor /node_modules to share dependencies between "main" entry points opens a hole in the determinism when applications run on different machines, because control is lost in guaranteeing version resolution from machine to machine. Being able to symlink to machine stores is another way modules can be shared between applications, but still allow them to fully dictate the precise versions they require. Just curious what your thinking is in that regard?

One area we've yet to cover, is how the memory-bloat and add-on crashing problems created when using symlinks get fixed. Interestingly the solution to both is the same, and also enables using symlinks to easily address the issue of peer dependencies, without having to be concerned with the physical directory structure and how many places the peer dependency is depended upon, because all of them come down to only having a single instance of a module@version in memory. This is fundamentally solved by always caching module instances by their realpath (and this is a case where we do want to go all the way down to the real path), yet setting the __dirname to the requestPath of the first require() that originally loaded the module into memory. It does rely on package managers laying down physical dependency trees that honor the logical tree; i.e. within an execution context, the versions of a module's dependencies never change just because the module itself may be multiply depended upon. Also curious as to your thinking on this?

(I'm really trying to use less words and comment less often... please bear with me as I try to get better at that in this forum)

@phestermcs I apologize for mixing up terms. Yes, when I was saying anm above, I meant it to refer to your entire proposal for handling symlinks, which is more than just the added <x>+node_modules search paths. However, without that additional symlink handling, it doesn't really address the needs of this EP issue, because it does not "improve support for symlinks in the module system".

Moving on, if something is "the same as --preserve-symlinks" then that does not mean it is "backwards compatible". In fact, it means it is not backwards compatible, which is why --preserve-symlinks is behind a flag as of 6.2. That behavior broke many existing use cases in the wild when it was initially released in 6.0, leading to a lot of public outcry. Since it is now behind a flag, and a breaking change, it's unlikely to ever get widespread adoption. Eventually, the cost of maintaining the forked behavior switch will outgrow whatever tiny value it provides to a tiny section of the community, and it'll be removed.

That's what I'd like us to avoid. That's why it may seem like I'm throwing your proposal out without giving it a fair shake, because there are already several identifiable points where it does not satisfy the criteria for a solution in this case.

If we can come to a solution that also supports the peerDependencies use cases that motivated --preserve-symlinks, it may hasten the demise of that switch, which would also be a bonus in my opinion.

It's also worth noting the role that citgm plays in this sort of change, since it's come up a few times earlier in the conversation.

"Citgm" stands for "canary in the gold mine", a reference to sacrificial birds used as early warnings for poison gas leaks in mines. Its purpose is to be a ratchet that indicates known cases where a change to node will cause breakage in important parts of the community.

While passing citgm is necessary for backwards compatibility assurances, it is not sufficient for calling something backwards compatible. If a proposal passes citgm, but we humans can point to some cases where it breaks a use case that is not included in citgm, then it's still not backwards compatible. Citgm is a test suite, not a specification.

I'm not saying that something like adjacent node modules is completely unacceptable. In fact, since @zkochan and @phestermcs rightly pointed out that the {"main":"contents"} package.json approach doesn't support require('foo/bar.js'), then that leaves "symlink all the files" as the only suitable module management approach identified so far using the node lookup path proposals on my ep-46 branch. As a module management approach, that is a bit more complicated, so it is worth continuing to investigate.

As a final point, symbolic links are supported just fine in Windows, provided that either (a) the third option of "dir" or "file" is passed to fs.symlink, and the target is an absolute path, or (b) the user is a system administrator. What isn't supported in Windows at all is shebang lines in executable scripts, which is why for example npm creates .cmd shim files for package executables.

@phestermcs I've edited your last post. This isn't a political forum and I found the reference you made to be offensive.

@phestermcs I'm not interested in any in-person or skype meetings at this time, and this issue is not the appropriate forum to discuss changing the node project's process. Multiple project admins have attempted to assist you in participating productively. At this point, after derailing the conversation repeatedly, I have lost confidence that you're willing to engage in good faith on the technical issue at hand.

I'm going to keep working on this technical issue, but I strongly recommend that the admins block you from participation with this project if you cannot behave appropriately.

I have nothing against you personally. I just want to get this issue fixed for our users, and you're standing in the way of progress at this point.

@phestermcs : once again, it's fairly clear that how you are choosing to engage in this conversation is what is getting in your own way of making progress and is what is turning individuals off of the idea of working with you to resolve the issues here. I would like to ask that you please take a couple of days break from this issue and should you choose to continue engaging in this conversation, please post less frequently, with fewer words, and far less aggressively. We are all extremely busy and exceedingly few people here, if any, will have either the time or desire to read through giant walls of text that do little to move the conversation forward.

ftr. i removed several comments as they were irrelevant to the issue and inappropriate. i've exited the situation

Please keep this issue open. Paul has transferred me his repos. I will try to test whether his adjacent node_modules solution satisfies all the requirements listed by @isaacs.

This issue is very important for the pnpm/ied communities.

Took a while, but I have caught up with this thread. I wanted to share some code that I've been working on https://github.com/wmhilton/modinst Currently, it uses @phestermcs's branch, but I'm not certain that my result could not be achieved without it. In fact, right now my brain thinks it could work using the existing --preserve-symlinks behavior if combined with npm3 style flattening, but I would need to think it through more.

The aspect of @phestermc 's contribution that I've latched onto is the idea that we could disentangle the physical module location from the require paths. So that's what I've tried to do in my prototype module installer. My implementation works kind of like this. (directory paths have been shortened for illustration)

Packages are extracted into a .store directory. This is roughly following the ".store" layout that @zkochan (pnpm) and @alexanderGugel (ied) are trying to develop. This is a real example by the way. :)

~/.store/async/0.9.0/
~/.store/colors/1.0.3/
~/.store/corser/2.0.1/
~/.store/ecstatic/1.4.1/
~/.store/eventemitter3/1.2.0/
~/.store/he/0.5.0/
~/.store/http-proxy/1.16.2/
~/.store/http-server/0.9.0/
~/.store/mime/1.3.4/
~/.store/minimist/0.0.10/
~/.store/minimist/0.0.8/
~/.store/minimist/1.2.0/
~/.store/mkdirp/0.5.1/
~/.store/opener/1.4.2/
~/.store/optimist/0.6.1/
~/.store/portfinder/0.4.0/
~/.store/qs/2.3.3/
~/.store/requires-port/1.0.0/
~/.store/union/0.4.6/
~/.store/url-join/1.1.0/
~/.store/wordwrap/0.0.3/

Packages are "installed" (I've been mentally using the term "linked") in a different root folder:

~/.lib/.bin/hs*
~/.lib/.bin/hs.cmd
~/.lib/.bin/http-server*
~/.lib/.bin/http-server.cmd
~/.lib/http-server -> ~/.store/http-server/0.9.0/
~/.lib/http-server+node_modules/colors -> ~/.store/colors/1.0.3/
~/.lib/http-server+node_modules/corser -> ~/.store/corser/2.0.1/
~/.lib/http-server+node_modules/ecstatic -> ~/.store/ecstatic/1.4.1/
~/.lib/http-server+node_modules/ecstatic+node_modules/he -> ~/.store/he/0.5.0/
~/.lib/http-server+node_modules/ecstatic+node_modules/mime -> ~/.store/mime/1.3.4/
~/.lib/http-server+node_modules/ecstatic+node_modules/minimist -> ~/.store/minimist/1.2.0/
~/.lib/http-server+node_modules/ecstatic+node_modules/url-join -> ~/.store/url-join/1.1.0/
~/.lib/http-server+node_modules/http-proxy -> ~/.store/proxy/1.16.2/
~/.lib/http-server+node_modules/http-proxy+node_modules/eventemitter3 -> ~/.store/eventemitter3/1.2.0/
~/.lib/http-server+node_modules/http-proxy+node_modules/requires-port -> ~/.store/requires-port/1.0.0/
~/.lib/http-server+node_modules/opener -> ~/.store/opener/1.4.2/
~/.lib/http-server+node_modules/optimist -> ~/.store/optimist/0.6.1/
~/.lib/http-server+node_modules/optimist+node_modules/minimist -> ~/.store/minimist/0.0.10/
~/.lib/http-server+node_modules/optimist+node_modules/wordwrap -> ~/.store/wordwrap/0.0.3/
~/.lib/http-server+node_modules/portfinder -> ~/.store/portfinder/0.4.0/
~/.lib/http-server+node_modules/portfinder+node_modules/async -> ~/.store/async/0.9.0/
~/.lib/http-server+node_modules/portfinder+node_modules/mkdirp -> ~/.store/mkdirp/0.5.1/
~/.lib/http-server+node_modules/portfinder+node_modules/mkdirp+node_modules/minimist -> ~/.store/minimist/0.0.8/
~/.lib/http-server+node_modules/union -> ~/.store/union/0.4.6/
~/.lib/http-server+node_modules/union+node_modules/qs -> ~/.store/qs/2.3.3/

As is, that directory structure wouldn't let you globally install multiple versions of http-server, but you could probably append the version number to the executables and then use more symlinks, like how Ubuntu handles python, python2, python3.

During runtime, I presume the require paths are something like this for every module:

~/.lib/http-server/node_modules  Oh, that doesn't exist.
~/.lib/http-server+node_modules  OK

I believe that this arrangement will allow for require('lodash/merge') style requires but I haven't tested it.

I believe this arrangement fails the problem @isaacs posed earlier where two modules require the same package but need to get separate copies in memory, because packages are cached by their real path in the v7.2.0-sjw branch. However, if one used the symlink path to do module caching, which IIUC is what --preserve-symlinks did, then I think this achieves functional equivalence with npm2 installs? Then making sure modules require the same instance in memory could be done by hoisting dependencies (or whatever "flattening the directory structure" is called) could it not?

Perhaps this arrangement works because I only have ONE symlink in each path. Something I haven't considered is whether all hell breaks loose if you permit packages in ~/.store to also use symlinks.

This arrangement has one extremely cool property. Because all "installations" are done outside of the module store, you can have the entire npm repository mounted at ~/.store (disk space is cheap, right?), and install any combination of packages you want simply by creating symlinks and empty directories in ~/.lib. The amount of disk space OR time required to install packages would be very minimal. You could install every version of every npm package, ever, on one machine. (Minus the one's with post-build steps or that have other environment changing side-effects when you install them.) If this sounds excessive, think about how it could benefit continuous integration services or citgm-style systems that have to rapidly install many packages.

@wmhilton Does that get around the breaking cases I mentioned above?

If we "improve support for symlinks", but only via the use of a flag, then that won't actually improve life for the vast majority of Node users, and adds complexity to the run-time for little benefit. If we accept a solution that is not backwards compatible, then it can't be outside of a flag.

That's why I am pushing for finding an approach that makes ied style module stores possible, without breaking compatibility with the existing use cases that Node supports today, even if it adds a bit more complexity to the tooling. With my ep-46 branch, no existing use cases are broken, and while the tooling needs to do quite a lot of extra work, I think it's heading in the right direction.

@isaacs Could you please clarify a few things about your solution?

  1. Are you adding only the app's node_modules folder to the lookup or on each level? E.g. in this application, when requiring bar in the foo package, will the app/node_modules/foo/node_modules be in the lookup?
store
+-- foo@1.0.0
    +-- index.js
    +-- package.json
+-- bar@1.0.0
    +-- index.js
    +-- package.json

app
+-- node_modules
    +-- foo
        +-- index.js (symlink to ~/store/foo@1.0.0/index.js)
        +-- package.json (symlink to ~/store/foo@1.0.0/package.json)
        +-- node_modules
            +-- bar
                +-- index.js (symlink to ~/store/bar@1.0.0/index.js)
                +-- package.json (symlink to ~/store/bar@1.0.0/package.json)
  1. Currently in pnpm/ied, we are symlinking the package directories to the store, your solution would require symlinking of each file separately. Is my understanding correct?

@zkochan

Are you adding only the app's node_modules folder to the lookup or on each level? E.g. in this application, when requiring bar from the foo package, will the app/node_modules/foo/node_modules be in the lookup?

Yes, my ep-46 branch adds two changes:

  1. When a module is found at linkpath, it is resolved to realpath, and then the node_modules folders for realpath are added to the lookup path set, as well as any node_modules directories from linkpath that are not already present in the set.
  2. Furthermore, when a module is loaded, it has a parent module that is the thing that require()'ed it. Any paths in the parent module's lookup set are added to the child module's, if they are not already present in the set.

Item (1) makes it so that foo/index.js (which is a link to .store/foo@1.0.0/index.js) finds the bar module at app/node_modules/foo/node_modules/bar, because app/node_modules/foo/index.js is the linkpath and .store/foo@1.0.0/index.js is the realpath.

Item (2) is necessary when a package has more than a single file. For example, if ./blerg.js were loaded from .store/foo@1.0.0/index.js, then the linkpath and realpath are both the same: .store/foo@1.0.0/blerg.js. If blerg.js then does require('bar'), it wouldn't find it. By adding the parent module's node_modules set as well (which include the paths based on the foo link path), it finds the module at app/node_modules/foo/node_modules/bar/index.js.

Currently in pnpm/ied, we are symlinking the package directories to the store, your solution would require symlinking of each file separately. Is my understanding correct?

Since it's backwards compatible, ied and pnpm could continue to work exactly the same as they do today.

However, if they wanted to allow for updating in place without altering the meta-deps of other packages that have the same dependency, and also support require('foo/bar.js'), then yes, they'd have to change to symlink each file into a unique directory per project, rather than just symlink the folder into place.

This also gets us the benefits that motivated --preserve-symlinks, by allowing linked peer dependencies to find their shared dep at the app level. (Technically only the parent-node_modules-inheriting bit is needed for that feature, but then you can't do the other stuff.)

bmeck commented

@isaacs we need to be careful with ESM and .parent, there are times where it is ambiguous what is the parent (since there can be multiple)

a imports b
a imports c
c imports b

b evals # what is the parent?
c evals
a evals

@bmeck Ah, that's a great point!

Since it's strictly additive, we could say that all of the parents in the import step can add their unique node_modules folders to the set.

Alternatively, we could say that only the first importer is the "parent", which is closer to how CJS modules would work anyway. That is, even if a CJS module is loaded multiple times, the "parent" is always the first one, and it's cached forever.

bmeck commented

@isaacs if it isn't kept up to date w/ all parents doesn't that mean order of imports becomes important for ESM's resolution algorithm (it isn't currently)?

@bmeck Yeah, whether it includes all parents or just the first one, any inheritance of node_modules paths will mean that the order of imports is relevant. Consider this layout on disk:

.
โ””โ”€โ”€ node_modules
    โ”œโ”€โ”€ baz -> ~/.store/baz
    โ”‚   โ””โ”€โ”€ node_modules
    โ”‚       โ”œโ”€โ”€ bar -> ~/.store/bar   (a)
    โ”‚       โ””โ”€โ”€ bloo   (v1)
    โ””โ”€โ”€ foo -> ~/.store/foo
        โ””โ”€โ”€ node_modules
            โ”œโ”€โ”€ bar -> ~/.store/bar   (b)
            โ””โ”€โ”€ bloo   (v2)

So, there are two copies of bloo on disk. If bar does require('bloo'), then (with isaacs/node@ep-46 proposal) it'll get the one from the module (foo or baz) that first loaded bar.

Even if both of the parent modules' node_modules folder paths are added to bar's set, whichever parent is added first will have higher priority.

Currently, bar will just fail when it runs require('bloo'). With --preserve-symlinks, it'll load 2 copies of bar into memory, and with bar (a) getting bloo (v1) and bar (b) getting bloo (v2). If we land isaacs/node@ep-46, then it'll depend on whether the app first loads foo or baz. If foo is loaded first, then bar (b) will get loaded, and pull in bloo (v2). When baz is loaded, it'll get bar from cache, including bloo (v2). If baz is loaded first, then the reverse will happen.

I'm honestly unclear about the best way to decide which of these approaches is best, even before worrying about ESM. Certainly resolving everything is better than resolving nothing (ie, --preserve-symlinks), if only for backwards compatibility, but the non-determinism of any compromise makes me a little bit itchy.

Does that get around the breaking cases I mentioned above? -@isaacs

1. A file symlink that is not the main module. Haven't tested it.

2. A file symlink that is a link to a symlink, even as the main module. Still fails.

3. Since anm only looks one symlink deep....../home/isaacs/node_modules won't be in the module search path. I'm pretty sure that is the desired behavior though, isn't it?

I have been thinking about it, and I've concluded that symlinks per se aren't needed to pull off machine-level stores. For instance, it would be an acceptable compromise to wrap commands with a shim, because npm already does this on Windows. If we're going to need to wrap top-level applications in a shim regardless, we might as well use that shim to monkey-patch module.require. Maybe everyone will boo and hiss at that idea, but I'm really a fan of keeping core lean, and being able to solve machine-level stores purely with userspace code appeals to me. Thoughts?

bmeck commented

@wmhilton the ESM proposal does not allow modifying the resolution algorithm. The very specific hooks that are safe to allow are shown in the slides from the Sept TC39 meeting. I would take a look at that and see if it is possible using ESM cache manipulation or if your solution only working in CJS works.

@wmhilton #46 (comment) Warning! This is definitely possible, but leads almost invariably to a very dark and twisted maze of shims and complexity.

The reason that node and npm switched to exclusively using node_modules paths for resolution as of npm 1.0 and node 0.4 is precisely because the "shims upon shims" creates a lot of surface area for edge case bugs, and makes programs difficult to debug.

I've spoken about this with a few folks who actually see the "update metadeps at a distance" as a feature rather than a downside of ied and linked module directories. (Not people using ied itself, but using npm link, and eager to try out ied/pnpm for this reason, because "it sounds like npm link, but even better".) Again, I'm not sure how this changes the breakage/value considerations here, but at least, it's worth keeping in mind that removing support for "lookup deps based on realpath" is not going to be an unmitigated benefit.

@bmeck I think that we should be very wary of adding any features that will make ESM have to behave (even more) significantly different from CJS modules. The more parallels we can maintain, the better it'll be for people transitioning from one to the other.

Thanks for sharing these slides. I'll review these this week and try to see if I can imagine a way to conceptualize the same effective behavior in a way that doesn't violate the constraints of ESM loading.

The nice thing about both the current behavior and --preserve-symlinks is that the module lookup location is only based on the module itself, and not the order of loading or the first parent module.

At first glance, it looks like it'll be very difficult to maintain this unless we go the route (like --preserve-symlinks) of saying that a module loaded from two different symlink locations is a fundamentally different module. That is, taking up another spot in the cache and in memory. It may be possible to start with some of the presumptions of --preserve-symlinks, but walk back from there to adding the realpath-based lookup set. This will of course break our de facto singleton behavior, but that's not an entirely safe assumption anyway.

Warning! This is definitely possible, but leads almost invariably to a very dark and twisted maze of shims and complexity.

Just to be clear, I mean this warning in the sense of "That is going to be an exciting adventure!", and I strongly encourage you to try it, if you've got the time and are in the mood for adventure. But it definitely won't be safe, so it shouldn't be something that npm or node-core try to do :)

@isaacs I wanted to try your fork of Node with a slightly modified pnpm.

I had this error when trying to run the pnpm tests:

module.js:498
    throw err;
    ^

Error: Cannot find module '/home/zkochan/src/pnpm/node_modules/.bin/_bin.js'
    at Function.Module._resolveFilename (module.js:496:15)
    at Function.Module._load (module.js:443:25)
    at Module.runMain (module.js:643:10)
    at run (bootstrap_node.js:420:7)
    at startup (bootstrap_node.js:139:9)
    at bootstrap_node.js:535:3

Is your fork of Node ready for experimenting with? Let me know if I can help with something.

bmeck commented

@isaacs keep us up to date, i am concerned with potential startup time increase from searching if symlinks are overly common and the fact that things become more non-deterministic than they already are.

@bmeck Yeah, the ep-46 branch is probably dead on the vine because of the nondeterminism. The startup time is likely not going to be relevant. It only adds 4-5 lstat calls total, since the stats are cached, and all modules in use are likely linked into the same store path. Now that realpath uses the system builtin, it's also a lot faster than it used to be.

@zkochan It should work fine, I'm surprised that any tests are failing. Does it work with node on master?

@isaacs is there a change in the ep-46 branch to use the native real-path? AFAIK we reverted it and didn't maintain the implementation

@isaacs I tried with master. No issues there

@thealphanerd Ah, ok. I'd thought it was using the native realpath and then falling back to the JavaScript implementation, but I guess that was just a suggestion I heard somewhere and didn't make it into core.

@zkochan Is it possible to reproduce that test in a reasonably minimal way? Anything that works on master should work on ep-46.

Just to be clear, I mean this warning in the sense of "That is going to be an exciting adventure!", and I strongly encourage you to try it, if you've got the time and are in the mood for adventure. But it definitely won't be safe, so it shouldn't be something that npm or node-core try to do :)

Thanks! And yes, it's precisely because it is crazy and out there that I'm going to try to do it in user space via monkey patching rather than by branching node core. It'll be fun. When/if I get a machine-store loader working I'll post back with a link.

@isaacs with a small fix in your ep-46 branch that you can see in this PR, pnpm tests are passing.

I checked and after your changes pnpm works with regular Node.js. No --preserve-symlinks mode is required.

I totally vote for your solution!

I noticed that @isaacs's solution uses the real path for __dirname. As a consequence, babel-cli fails because it uses __dirname in some strange way to execute a file from the same dir in a new process (what I mean: https://github.com/babel/babel/blob/master/packages/babel-cli/src/babel-node.js#L11).

This has made me thinking, should there be some new variable that will have the symlink location of the file? The symlink equivalents of __dirname and __filename. Something like __ldirname and _lfilename

I have an update regarding this thread. I've been working on changing pnpm, so that it does not rely on the --preserve-symlinks feature of Node. And my attempts have succeeded!

The latest version of pnpm (which is 0.51.2) uses a global (machine) store and works without any changes in Node.js.

We did a lot of tweaks to make it work, but the main ones are:

  • files are linked (not symlinked) from the global store to the project's node_modules
  • command shims are rewritten to set the NODE_PATH env variable before running the binstubs.

So it is achievable to create a global store without changes in Node.js and without --preserve-symlinks. And performance is good enough with linking.

Just updating this to say I no longer care about symlink support. Since we've managed to come up with two ways to create a global store without using symlinks (require-hacking and hard links) I think the global store use case is no longer a valid justification for changing the behavior of require in node core.

@zkochan @wmhilton Interesting, thanks for the update.

That still doesn't solve the symlinked peer-dependency use-case though, correct?

@isaacs Er... what was that use case again?

@wmhilton

node_modules/
โ”œโ”€โ”€ framework@ -> /path/to/framework
โ””โ”€โ”€ plugin@ -> /some/other/path/to/plugin

I do require('plugin') and then plugin does require('framework') and can't find it.

@isaacs I don't think require-hacking nor hard-linking would necessitate that situation though... @zeke is this the situation you were just describing in the webpack plugin issue on pnpm?

@isaacs your example is a little bit confusing. Plugin will most likely have framework as a peer dependency, and pnpm links peer deps to the node_modules folder of the dependent packages. So framework would be symlinked to /some/other/path/to/plugin/node_modules.

On the other hand, if framework needs to require plugin, it will work if the framework is a CLI app. The node_modules/.bin/framework cmd shim would add node_modules to the NODE_PATH env variable. So framework would find plugin. If framework is not a CLI app, probably it is used programmatically and plugins are passed into it.

Time will show, but this approach seems to work fine with apps like eslint, karma, babel

The node_modules/.bin/framework cmd shim would add node_modules to the NODE_PATH env variable.

What system writes cmd shims that update the NODE_PATH env? npm doesn't do that.

Dig up the discussions that led to --preserve-symlinks being added added originally. This is definitely a valid use case that people actually have. I don't think --preserve-symlinks is a good solution, but it is a thing that's probably worth solving.

I forked cmd-shim and added the possibility to set the NODE_PATH env. pnpm then defines the value of NODE_PATH.

This is my personal opinion, but I think the solution should be either the same as --preserve-symlinks or the current, which is resolving to real path. Other solutions would extend too much the possibilities of require. Like flat node_modules did it with npm@3. We can end up in a situation where any silly code's require will work because of the flat structure + the symlink logic. It won't matter if the dependencies were specified in package.json or not.

Dig up the discussions that led to --preserve-symlinks being added added originally. This is definitely a valid use case that people actually have.

Is this for development purposes? Or are symlinks and --preserve-symlinks used in production? I'm just confused because I've installed packages that use peer dependencies and I've never seen npm insert a symlink for them.

I don't think --preserve-symlinks is a good solution, but it is a thing that's probably worth solving.

Maybe, but I have reason to believe at this point it could be solved outside of node core.

bmeck commented

@jasnell is this still desired since --preserve-symlinks seems to be staying around?