bundler
A utility for bundling resources in web pages into a single page by inlining them using the data URI scheme.
It is designed to be easy to use in all kinds of scenarios, ranging from ones where one simply wants to produce a web page with certain resources bundled to scenarios where one would like to proxy resource requests, include caching functionality, modify request headers, and more.
Basic Usage
The simplest use case for the bundler is the case where one would like to:
- Fetch a web page, given a URL
- Replace references to certain kinds of resources with their data URIs
This can be accomplished very simply. In the following example, we can fetch Hacker News's main page and bundle the images on the site into the HTML.
var bundleMaker = new bundler.Bundler('https://news.ycombinator.com');
bundleMaker.on('originalReceived', bundler.replaceImages);
bundleMaker.bundle(function (err, bundle) {
console.log(bundle);
});
Currently, there are five resource handlers exported in the bundler module:
- replaceImages
- replaceCSSFiles
- replaceJSFiles
- replaceLinks
- replaceURLCalls
The first three are pretty straightforward. You register all of them the same way (like in the example of repaceImages
above).
replaceLinks
accepts a function that will be called to generate new links. The signature of this function is
function linkReplacer(baseURL, resourceURL) {
return combine(baseURL, resourceURL);
}
Here, you are free to define how you combine the baseURL and resourceURL as long as you return the value at the end.
For example, you might register a link replacer to transform all urls to the form "http://localhost:9001?url=URL".
replaceURLCalls
does not accept any parameters. It will search through the inline style
attributes of tags and bundle the resources referenced in the CSS url()
function.
More on writing your own resource handlers in the Writing Your Own Hooks section below.
Terminology
A diff
is an object whose key is a resource URL and whose value is a data URI.
A handler
is a function that analyzes a web page to produce diffs.
A hook
is a function that can manipulate request option data or diffs.
Using Hooks
Bundler provides five opportunities to inject new functionality into the bundling process.
originalRequest
- Before fetching the first, original documentoriginalReceived
- Specify handlers used to scan the document and produce diffsresourceRequest
- Before fetching each resource referenced by the original documentresourceReceived
- After retrieving each resource referenced by the original documentdiffsReceived
- After accumulating a collection of resource URLs and their data URIs
As seen above, you can register handlers using the originalReceived
event.
Before fetching the original document
Because one may wish to modify request headers, among other things, bundler
allows for hooks to be called before making a request for the original document.
For example, to replace the Referer
header with https://duckduckgo.com
:
var bundleMaker = new bundler.Bundler('https://yahoo.com');
bundleMaker.on('originalRequest', bundler.spoofHeaders({
'Referer': 'https://duckduckgo.com'
}));
bundleMaker.on('originalReceived', bundler.replaceImages);
bundleMaker.on('originalReceived', bundler.replaceCSSFiles);
bundleMaker.bundle(function (err, bundle) {
console.log(bundle);
});
Currently, three request modifiers are exported.
stripHeaders
stripHeaders
accepts an array of header names to replace with blank values in the
request object and returns the hook function that Bundler.on
expects.
spoofHeaders
spoofHeaders
accepts an object mapping header names to the value to insert in
their place and returns the hook function that Bundler.on
expects.
proxyTo
proxyTo
accepts a URL that requests will be configured to use as a proxy.
See request's documentation on proxies
to understand how that works. The function accepts the URL and returns the handler
function that Bundler.on
expects.
followRedirects
followRedirects
confgures the redirect-handling options for the request
object. It accepts:
first
: bool - Whether or not to follow the first redirect resulting from a request.all
: bool - Whether or not to follow all redirects that might result from requests.limit
: int - The maximum number of redirects to follow.
See request's option documentation to learn more about what request defaults to.
Handling resources
Resource handlers are used to extract references to resources in a document, such as those in script tags, link tags, and so on. They are responsible for producing diff objects that bundler will go on to use to replace references to such resources with data-URIs.
As mentioned in the Basic Usage section, resource handlers are registered
using the originalReceived
event. Bundler currently exports the following handlers.
- replaceImages
- replaceCSSFiles
- replaceJSFiles
- replaceLinks
- replaceURLCalls
Before fetching each resource
Bundler allows request options to be set for each resource that is to be retrieved.
The functions from bundler.modifyRequests
can be reused here. For example:
var bundleMaker = new bundler.Bundler('https://yahoo.com');
bundleMaker.on('resourceRequest', bundler.spoofHeaders({
'Referer': 'https://duckduckgo.com'
}));
bundleMaker.on('originalReceived', bundler.replaceImages);
bundleMaker.on('originalReceived', bundler.replaceCSSFiles);
bundleMaker.bundle(function (err, bundle) {
console.log(bundle);
});
Note that the only difference between this example and the one in the first section
is that this one registers hooks for the resourceRequest
method. You can reuse
handlers written for originalRequest
here.
After retrieving each resource
Bundler allows hooks to be registered to directly manipulate the body of a fetched resource through the resourceReceived
event. Currently no
handlers are exported directly by bundler for these events.
Currently there is one function exported by bundler to operate on retrieved resources. This is the bundleCSSRecursively
function, which takes no arguments to use. It will find instances of calls to url()
in CSS documents
and replace the URL within with a data URI.
After building data URIs
In case one would like to prevent bundler from making certain kinds of replacements, for example if a data URI is too long or the resource is hosted on a particular site, one can register hooks to be run when the collection of diffs has been compiled.
Currently, bundler.filterDiffs
is the only existing hook exported
for this purpose. An example of filtering out resources that appear to be hosted
on google.com
:
var bundleMaker = new bundler.Bundler(url);
bundleMaker.on('originalReceived', bundler.replaceImages);
bundleMaker.on('originalReceived', bundler.replaceCSSFiles);
bundleMaker.on('diffsReceived', bundler.filterDiffs(function (src, dest) {
return src.indexOf('google.com') < 0;
}));
bundleMaker.bundle(function (err, bundle) {
console.log(bundle);
});
Writing your own hooks
It is, of course, possible to write your own hooks to use in any of the cases outlined above. Each type of hook has a different signature but are expected to behave in approximately the same ways.
Before fetching the original document
Hooks to be added to the Bundler object using the originalRequest
event
should have the following form:
function handlerName(options, callback) {
// Do something with options
callback(err, options);
}
The callback
provided is used to iterate through hooks, and so must be called
with any error that might occur (or null
otherwise) and the modified options
object.
The options
object is the object passed to the request
library as the first argument to request
as seen in the library's
Custom HTTP Headers
documentation.
For example, we could register the following hook to increment a global count of the total number of bundle requests the server has received.
var bundlerCalls = 0;
var bundleMaker = new bundler.Bundler(url);
bundleMaker.on('originalReceived', bundler.replaceImages);
bundleMaker.on('originalRequest', function (options, callback) {
bundlerCalls++;
callback(null, options);
});
bundleMaker.bundle(function (err, bundle) {
console.log(bundle);
});
Handling resources
Handlers for replacing resources in a document, like bundler.replaceImages
can also be supplied to the bundler. Such functions have the following form.
function handlerName(request, originalDoc, url, callback) {
var resourceURL = findResourceURL(originalDoc);
request({url: resourceURL}, function (err, response, body) {
// produce a diff object
var diff = { 'source-url': 'replacement' };
callback(errorIfAny, diff);
});
}
The request
parameter is a wrapper around the request function that will invoke all hooks inserted via resourceRequest
and resourceRetrieved
to modify the options
object and to produce diffs for the resource before and after making the request. It accepts an
options
parameter (or resource URL) and a callback to handle the response.
The originalDoc
parameter contains the document fetched by the original request.
The url
parameter is the URL originally requested, used to produce resolved
paths to discovered resources.
The callback
parameter is used to iterate through a call to async.reduce
and must be invoked with any error that occurs (or null) and the diff object
created.
Such handlers tend to become quite complicated quickly as multiple requests will need to be made for resources. In the bundler library, async.reduce
is
used to build the diff object.
Before fetching each resource
Hooks to be added to the Bundler object using the resourceRequest
event
should have the following form:
function handlerName(options, callback, originalDocument, response) {
// Do something with options
callback(err, options);
}
You can reuse hooks for the originalRequest
event here.
The options
and callback
arguments here are the same as they are for the
originalRequest
handlers.
The originalDocument
argument here contains the content of the originally
fetched document.
The response
argument is the response object provided by the call to request
for the original document, which is an instance of
http.IncomingMessage.
This may be useful for obtaining response headers and the status code.
For example, you could set the Referer header of the resource request to the value of the Host header in the response.
var bundleMaker = new bundler.Bundler(url);
bundleMaker.on('originalReceived', bundler.replaceImages);
bundleMaker.on('resourceRequest', function (options, callback, doc, response) {
if (!options.hasOwnProperty('headers')) {
options.headers = {};
}
options.headers['Referer'] = response.headers['host'];
callback(null, options);
});
bundleMaker.bundle(function (err, bundle) {
console.log(bundle);
});
After retrieving each resource
Bundler allows hooks to be registered to directly manipulate the body of a fetched resource through the resourceReceived
event. Such hooks have
the following signature.
function handlerName(requestFn, options, body, diffs, response, callback) {
// Make a diff object for the resource body
callback(err, diff);
}
requestFn
is a wrapper around the request
library's exported function and can be used to fetch resources.
options
is the options object passed to the request made to fetch the resource for which the resourceReceived
event was triggered.
body
contains the string contents of the resource in question.
diffs
is a diff object containing the diffs for the resource assembled by previously-invoked hooks.
response
contains the response object corresponding to the request for the resource in question.
callback
is the async.reduce
callback and must be invoked with any error that might have occurred (or null) and the new diff object to pass onto the next hook.
After building data URIs
Hooks to be added to the Bundler object using the diffsReceived
event
should have the following form.
function handlerName(diffs, callback) {
// Do something with diffs
callback(err, diffs);
}
One could write a handler to count the number of images, CSS files, and JS files having replacements made with the following hook.
var cssReplaces = 0;
var jsReplaces = 0;
var imgReplaces = 0;
var bundleMaker = new bundler.Bundler(url);
bundleMaker.on('originalReceived', bundler.replaceImages);
bundleMaker.on('originalReceived', bundler.replaceCSSFiles);
bundleMaker.on('originalReceived', bundler.replaceJSFiles);
bundleMaker.on('diffsReceived', function (diffs, callback) {
var sources = Object.keys(diffs);
for (var i = 0, len = sources.length; i < len; ++i) {
switch (bundler.mimetype(sources[i])) {
case 'text/css':
cssReplaces++;
break;
case 'application/javascript':
jsReplaces++;
break;
default:
imgReplaces++;
}
}
callback(null, diffs);
});
bundleMaker.bundle(function (err, bundle) {
console.log(cssReplaces + '\t CSS files replaced.');
console.log(jsReplaces + '\t JS files replaced.');
console.log(imgReplaces + '\t Image files replaced.');
});
Helper functions
To make writing handlers and hooks a little bit easier, Bundler exports the following functions.
bundler.mimetype(url)
Infers, where possible, the mimetype of a resource based on its URL. It is better to determine this information from the Content-Type header in a response, however this function is provided as a useful helper.
bundler.dataURI(response, baseURL, content)
Produces the data URI for a resource given the response object corresponding to the request for the resource, the base URL (e.g. www.google.com) for the resource, and the content of the resource as a Buffer obect.
bundler.strReplaceAll(string, str1, str2)
Replaces all instances of str1
in string
with str2
.
bundler.applyDiffs(string, diffs)
Applies all the replacements provided by a diff object to a given string.
For example, the string "abc"
with diff {'c': 'd'}
would become "abd"
.
bundler.htmlFinder(source, selector, attr)
Returns a function that accepts a callback. Will scan through an HTML document
source
and invoke the callback with the value of the attribute of each element obtained using the provided selector. This works using the
Cheerio library, so selectors should be supplied accordingly.
bundler.cssReferenceFinder(source)
Like htmlFinder
, will return a function that accepts a callback. The callback will be invoked with the URL found within all instances of calls to
url()
in the CSS document source
provided.
bundler.replaceAll(request, url, finder, callback)
Uses a finder (the callback-accepting function provided by a call to either htmlFinder
or cssReferenceFinder
) to identify and request resources. The url
argument must be the URL of the original document (e.g. www.google.com). The callback will be invoked with any error that occurs (or null) and a diff object.