istanbuljs/nyc

extract require hook logic into an external module

bcoe opened this issue ยท 35 comments

bcoe commented

@novemberborn has done a great job of generalizing the logic around extending require hooks, I think this logic would be a great candidate for pulling into its own module -- with a thorough test suite.

CC: @jamestalmage, @ariporad

@bcoe: Can I take this on? I'd like to give it a shot.

@sebmck, @gotwarlost: If I/we wrote a library for properly messing with require, would you switch to using it in your projects? I'd be happy to submit PRs if needed. (@gotwarlost, feel free to read babel/babel#3062 for some more context).

Also, (sorry for so many separate comments), what about naming it pirates, since your hijacking require, and it involves hooks? The name is available. We could also do requiarrr/requiargh, but that kind of makes me cringe.

@bcoe: (Sorry, I promise this will be the last one): here, multiple require hooks won't work together (like calling super) will they? They'll just override each other (while maintaining nyc's stuff), right? If not, what am I missing?

bcoe commented

@ariporad I like boring old, https://www.npmjs.com/package/require-wrap but I'm weird that way ๐Ÿ‘

I think the logic @jamestalmage put into the require-hook is quite sound, I would use this as a jumping off point -- it overrides the getters/setters on require.extensions, and sets aside other hooks that folks try to register, so that you end up with a stack.

There's been some discussion with @davglass, and @gotwarlost about moving some of the istanbul modules into an org, I think if you developed this require-hook library inside that org, I'd be more likely to pull it in to nyc -- I'd like to make sure that myself and a few other key members of nyc can easily contribute and release updates to that module -- since it's a core part of what we do.

@bcoe: (In order):

  1. We'll discuss the name
  2. Yes, it very much is. I defiantly plan to use that as a starting point, I just want to make it support more than one hook.
  3. It's already been done: https://github.com/istanbuljs.
  4. I'm happy to let you contribute, although I'm not quite sure what you mean by "pull it in to nyc". (I also think that it might be out of scope for the istanbul org, but I'm happy to put it there if people want) Here's what I'm hoping the API will be like:
var pirates = require('pirates'); // This auto-calls `pirates/register`, see below.

// ...

pirates.addCompiler('.js', function (module, filename) {
    return this.addFile(module, filename).content; // Just return the new source for the module.
});

Pirates will take care of the rest, including support for auto-fixing modules that directly modify require._extensions. People who's projects include one or more require hooks that don't support pirates can do something like this:

require('pirates/register');
// For some weird reason, part of my app is in coffee script, and some is in ES6.
require('coffee-script/register');
require('babel-core/regsister');
require('some-bad-module/register');
bcoe commented

@ariporad awesome, hadn't noticed istanbul had been moved into an org like this ๐Ÿ‘

It might be worth combining efforts and extending:

https://github.com/istanbuljs/istanbul-lib-hook

the main problem being that babel kicks this hook out of the way.

@bcoe: The trouble is that istanbul-lib-hook deals with a lot of istanbul specific things. I'm going to try to do something with the code from nyc, I should have something tomorrow night if not before.

Actually istanbul-lib-hook doesn't do anything that is istanbul specific.

Also look at the require hooking mechanism that has changed courtesy a PR from @blakeembrey

It allows for custom module extension loaders and hooks into module._compile much like the nyc hook

I think the logic @jamestalmage put into the require-hook is quite sound

Thanks for the credit, but it wasn't me. @novemberborn maybe?

istanbul-lib-hook looks very promising. I think the best strategy would be trying it in nyc, AVA, Babel etc. If we run into obstacles, we should try and fix them in istanbul-lib-hook before spinning off duplicate efforts.

I will handle AVA integration.

@gotwarlost: Wow, I certainly just skimmed that... Never mind. Although, it still doesn't support multiple hooks using it. (They would just override each other). I'll submit a PR that fixes that. (I'd still like to advocate for changing the name to pirates, but...)

I'll take babel.

@gotwarlost, @bcoe: Ok: I submitted istanbuljs-archived-repos/istanbul-lib-hook#2. It supports multiple modules using it, but has no support what so ever for other modules hijacking require without it. I can add that, but it feels totally out of scope for that project.

Ok, (I lied about the stopping the flow of comments bit): I think the best course of action is to close istanbuljs-archived-repos/istanbul-lib-hook#2, and someone (unless a volunteer comes forward before I get to it, I'll start, but I'm open to anyone) should combine istanbul-lib-hook and the nyc hook logic into one package (still advocating for calling it pirates), which supports:

  1. Super easy require-hook injection, with a super simple API that Just Works(tm), feeding hooks into each other in the order they are added. API concept:

    var pirates = require('pirates');
    
    // ...
    
    pirates.addCompiler('.js', function (filename) {
       return filename.indexOf('foo') !== -1;
    }, function (module, filename) {
        return this.addFile(module, filename).content; // Just return the new source for the module.
    });
  2. Using getters/setters, automatic fixing of hooks that don't use pirates, and still passing all requires through the hooks in order. API concept:

    require('pirates/register');
    // For some weird reason, part of my app is in coffee script, and some is in ES6.
    require('coffee-script/register');
    require('babel-core/regsister');
    require('some-bad-module/register');
  3. ???

That should allow keeping it relatively simple, and making it easy to adopt.

Sound good to everyone? Any objections/things I should change? (Also, if everyone's ok with it, I'd like to take it on, as I've never contributed (significantly) to an open source project that anyone actually uses before, and I'd like the experience).

bcoe commented

@ariporad I think you should take a crack at this, definitely a problem that everyone re-invents the wheel to solve -- I'm sure folks would be happy to move towards a well tested solution that covers the various edge-cases folks bump into.

Let me know anything I can do to help.

Oh, I almost forgot: @gotwarlost, @bcoe: can I have permission to (with attribution) modify and redistribute (well, really just take ideas from) your code? Thanks! (I'm going to release this under the MIT license, FYI)

@ariporad sounds cool! Let me know if you run into any trouble.

Ok everyone, let's move this discussion to the repo for the project, danez/pirates#1.

Ok, @bcoe, @gotwarlost, @novemberborn, @jamestalmage, @sebmck: I got pirates mostly done! It's now available on npm, and I've opened pull requests for babel (babel/babel#3139) and istanbul (istanbuljs-archived-repos/istanbul-lib-hook#5), and a PR for nyc is in the works. If you have any other modules you like that use require hooks, convince them to adopt pirates! It's super easy and reduces complexity!

After looking into having getters and setters for dealing with naughty require hooks, I ended up dropping that idea (for now), because basically any hook that misbehaves is using Module.prototype._compile instead of module._compile, which I haven't been able to figure out how to fix.

Enjoy!

I think there is some fundamental confusion here about why it was necessary for nyc to implement the getter/setter hook, I will try and explain:

For starters, I wrote this gist that is a step by step walkthrough of how a stack of pirates style extension works. The code flow is surprisingly complex when you peel it all apart. While it's long, I think it is worth a read to fully grok the chain of events.

The TL;DR version: Each extension applies its transform in the order installed. So if you add an extension that appends foo to every file, then one that installs bar to every file, you end up appending foobar to everything.

This is problematic for nyc and source-map-support. We need to make sure our transform is last. So we can apply the istanbul coverage transform on the final output, as well as store the incoming source-map information from all the previous transforms.

This is essentially impossible to guarantee, but @novemberborn's solution is the most elegant solution I have seen so far. It essentially uses a getter, setter pair to ensure that, even when future extensions replace the nyc extension, they can be wrapped so nyc is always last..

I have successfully generalized it. Doing so was a bit more complex than I expected, but it reliably ensures that your transform is always able to be applied last. It ultimately relies on every extension forwarding down the chain (Module.prototype._compile.apply(...) in a later install installed hook a sure fire way to break it).

See the test suite here. (Ignore the current module name, I don't like it).

@bcoe - The tests have 100% coverage, btw. I'll have a coveralls badge once I decide what to name the freaking thing.

This is essentially impossible to guarantee, but @novemberborn's solution is the most elegant solution I have seen so far.

Reading your walkthrough I'm thinking that may have been by accident! ๐Ÿ˜‰

Doing so was a bit more complex than I expected, but it reliably ensures that your transform is always able to be applied last. It ultimately relies on every extension forwarding down the chain (Module.prototype._compile.apply(...) in a later install installed hook a sure fire way to break it).

I guess nyc gets away with using Module.prototype._compile because it's the last transform in the chain, which is both out of necessity and because it is guaranteed to be installed first.

I guess nyc gets away with using Module.prototype._compile because it's the last transform in the chain,

Sort of. From my reading of the source code, I think things will break down if two or more transforms are installed on top of nyc. I may be missing something.

Actually, @jamestalmage, they're run in reverse order, so as long as nyc can register it's hook first, it will always be last.

Actually, @jamestalmage, they're run in reverse order, so as long as nyc can register it's hook first, it will always be last.

Nope. I have thoroughly verified that they do execute in order through multiple tests. I was certainly confused when I realized this. At first glance at the code, it does seem the order would be reversed. Read this gist to fully understand why/how they execute in order.

Wow, yeah, you're right. My bad. Sorry.

No problem. I started with the same assumption you did.

Ok, @bcoe, @jamestalmage, @novemberborn: Sorry, I totally misunderstood what you wanted. I'll try to build this into pirates, it doesn't seem unreasonable to want to do, and I'll try to do it in a way that allows a chain.

Hmm... Thinking about this: I'm probably going to implement this as an option (ie. { last: true }). What would you expect to happen if two modules both try to do this? Should it throw an error?

I think it's a bad idea to build it into pirates. The need for this is incredibly niche, and the implementations are entirely separate (virtually no repetition between the two). That alone begs for separately deployed modules.

I want to be able to just point people at pirates and say, "that is the way to extend require". Providing that option will lead to people using it ("certainly MY transform is the most important and should go last so it can transform all the things"). I think it would be far better if the few people who actually need this behavior use an alternate solution (like capture-require).

The last behavior has troubling consequences if widely used. Most importantly, it takes control away from the user.

Let's say, as a user, I actually want this ordering:

require('foo/register');
require('bar/register');
require('baz/register');

But if bar/register uses the last behavior, you have taken that control away from me.

It is reasonable for nyc and istanbul to use a last behavior. They would likely be broken by additional transforms, and they need to see the final source-map. Being able to do so is a condition of their operation.

I think pirates should focus on settling the remaining API questions, and providing an extremely thorough test suite for it's stated goal of properly extending require.

@jamestalmage: Fair enough. But at some point (soon) I'd like to fix NYC's logic. I'm not sure exactly the way to go, but I'm guessing it's to wrap Module.prototype.load.

after reading this thread, I feel like sharing this link: https://xkcd.com/927/ :)

@gotwarlost - Love that one, and it goes to the point I am making. I don't really want wide adoption of capture-require, except by those people who really need it (so far coverage tools is the only good example I have found). Baking that behavior into pirates is an invitation for people to use it. I am perfectly content posting a big warning in the capture-require readme: "go use pirates", it will contain some of the same arguments I make here.

@jamestalmage: Ok, that sounds great.. How ever I really do insist that it properly supports a require chain. I'm pretty sure the way to go about that is to wrap Module.prototype.load, then things like pirates will still work.

capture-require will do just that. The test suite proves it.

@jamestalmage: oh, thanks! You rock!