A plugin for webcheck to prevent multiple downloads of the same resource.
npm install --save webcheck-crawl-once
var Webcheck = require('webcheck');
var CrawlOncePlugin = require('webcheck-crawl-once');
var plugin = CrawlOncePlugin({
// filterUrl: /.html/,
ignoreQuery: false
});
var webcheck = new Webcheck();
webcheck.addPlugin(plugin);
plugin.enable();
// now continue with your code...
filterUrl
: Filter urls that should only crawled once (default all urls).ignoreQuery
: Ignore query in url.
Filters are regular expressions, but the plugin uses only the .test(str)
method to proof. You are able to write
your own and much complexer functions by writing the logic in the test method of an object like this:
opts = {
filterSomething: {
test: function (val) {
return false || true;
}
}
}
reset(undefined | url)
: Reset a specific url, or the complete ignore listignore(url)
: Add a resource to ignore listcheck(url)
: Check if a resource is ignored