LinkedInAttic/inject

Internally scoped require() and define() cause false downloads

cburgmer opened this issue · 8 comments

I am trying to use inject together with browserify. The latter is a perfect tool of compiling all local sources with CommonJS requires into one big distributable bundle. Inject can help complete such a setup when setting up a simple development environment without the need to constantly repackage the bundle with browserify.

However, importing existing bundles made by browserify triggers Inject to load resources contained inside those and resulting in a lot of 404s. This might not only load wrong resources, but slows down the initial loading phase a lot.

The proper solution for Inject would be not to parse the require statements that are offered by the browserify package itself.

(EDIT: Updated title to reflect the root of the issue)

I think the cause of this is our scanner is straight regex. It doesn't really discriminate within scopes for performance reasons. I have some design ideas I'd like to propose here to see if any of them work.

  1. Some kind of "do not scan" identifier we could add pragma-style comments to a file for inject which could aid the parser during extractRequires. For example, // # inject-no-scan could tell the parser there are no dependencies in this file (even if the require() statement would have been found). We could also write this in such a way that if we find we need additional pragmas, they would be easy to add.
  2. Make "don't scan this file" rules the RulesEngine could support this, although I'd like to keep the rules engine stuff small (there's already quite a few rule based methods right now)

I'm also open to any ideas you may have for how we can prevent a browserify package from being scanned. (We originally had a full on AST parser, but the performance was really poor compared to straight regexes on larger files)

Interesting, I was going to propose something along what browserify is doing, but then they do not have any performance concerns.

To 1.: Obvious problem: we would need to add such a pragma to auto-generated files.
To 2.: If using inject as a dev environment replacement for browserify I already need a handful of rules, as some imports are local and some are from node_modules/, so another rule might not matter. Although this should make us probably rather think about improving support there, too

How about: 3. A dirty regex, that checks for browserify-built packages explicitly? Probably not a good idea though.

Number 3 was just ruled out in a work conversation today. It turns out the built version of Bluebird actually has an internal require() function that trips this up too. So we can't just go testing for browserify (plus that feels kinda wrong).

Thankfully, Inject has a simple content manipulation system we can exploit. The following code examples would technically work, though I think we should openly talk about what the right solution would be.

Pragma Example: In this example, we are adding a content rule to modify the file after it has been retrieved, but before it has been executed. We would also need to make a change to the Analyzer object so that it can strip out noscan blocks during it's extractRequires method. The idea of a pragma is tempting since this is kind of the first of what could be several different ways to control the way Inject handles the file. Also proposed long term would be pragmas for globally available compile functions, or possibly even local callbacks, eliminating the need for Inject's internal plugin system.

Inject.addContentRule(/browserify-package/, function(next, contents) {
  next(['\n"#INJECT_NOSCAN";\n', contents, '\n"#INJECT_ENDNOSCAN";\n'].join(''));
});

// alternative if we made a helper that lets us internalize the strings.
Inject.addContentRule(/browserify-package/, function(next, contents) {
  next(Inject.pragma(INJECT.pragma.noscan, contents));
  // or
  next(Inject.pragma.noscan(contents));
});

Rrrrrrrequire! (We were playing the new Killer Instinct game at lunch here at work): In this idea, instead of trying to create a "noscan" block, we simply sub out the "require" function within the package for something that works just as well. It would end up looking something like the below. While this would be safer for Inject (no engine changes), I also feel like this could eventually mess up developers. On the plus side, it's available now.

Inject.addContentRule(/browserify-package/, function(next, contents) {
  next(contents.replace(/([\w\);])require/g, '$1rrrequire); // regex approximated
});

I'm going to add @asakusuma to the thread as well, who ran into the Bluebird problem to get his thoughts also. We actually ran into this bug at about the same time right before the holiday break. Thanks @cburgmer for helping us think through the issue.

Having thought about this over the break, I'm liking the pragma idea more and more. People could always pre-optimize their items if they want, add it in a build step, etc. Or, they can use more complex rulesets to do it all at runtime.

Here's some API ideas:

  • Inject.pragma._______(content): Wraps content in the specified pragma
  • Inject.pragma(name, content): Wraps content in the specified pragma name
  • no API: make the strings part of the docs, and leave it up to the user

Here's some string ideas:

  • "#INJECT PRAGMANAME" "#INJECT END PRAGMANAME": Feels more like #IFDEF/#ENDIFDEF syntax, may be more familiar to the C++ crowd
  • "use inject pragmaname": Feels more like "use strict" which while this may feel more familiar, may imply that there is some kind of lexical scoping, similar to how the "use strict" directive works. Even with some clever regex, while you could find where the parent function begins, finding its end would prove incredibly difficult.

I'll add more as I think of them.

I'm cool with implementing the pragma solution. Another pragma syntax option is using comments similar to how VenusJS does annotations for test files: https://venusjs.readthedocs.org/en/latest/reference/annotations.html

Also, I think there should be an option to use a simpler API like:

Inject.noScan('/browserify-package/*');

@asakusuma I thought about the Inject.noscan option, but the API is already pretty complex. We have

  • addFileRule
  • addModuleRule
  • addContentRule
  • addFetchRule
  • addPackage

I'd much rather us use a more generic set of processing helpers to be used inside of content rules. Otherwise, we risk growing the number of public methods quite a bit should the pragma idea take off.

Annotations suffer from the same problem as the use inject syntax. Without loading an AST, it would be very difficult to locate the function's end. Running on the server, venus has the advantage of being able to load the Esprima parser for their AST tool, making it easy to extract comments and therefore annotations.

@jakobo makes sense. When the dust settles we may find that there are specific, often-used combinations of the API that warrant a convenience function like noScan().

Annotations would be at the top of the file and would affect the entire file. I agree that wielding pragmas with lexical scope is too complicated at the moment.

At the local Inject meeting, we talked through the ideas, and we settled on the following:

  • Inject.pragma.noScan(content) will wrap content with noscan pragmas
  • The noscan pragmas are demarqued with \n/*#INJECT PRAGMANAME*/\n and \n/*#INJECT END PRAGMANAME*/\n
  • Analyzer will modify the scan to strip our first pragma, the NOSCAN pragma, similar to how we do block comment stripping.
  • Files such as bluebird / browserify will simply need a content rule that puts a noscan pragma around them. This keeps in line with Inject's philosophy of not modifying the source files.