/tile-reduce

mapreduce vector tile processing

Primary LanguageJavaScriptISC LicenseISC

TileReduce

Build Status

TileReduce is a geoprocessing library that implements MapReduce to let you run scalable distributed spatial analysis using Javascript and Mapbox Vector Tiles. TileReduce coordinates tasks across all available processors on a machine, so your analysis runs lightning fast.

Install

npm install tile-reduce

Usage

A TileReduce processor is composed of two parts; the "map" script and the "reduce" script. The "map" portion comprises the expensive processing you want to distribute, while the "reduce" script comprises the quick aggregation step.

'map' script

The map script operates on each individual tile. It's purpose is to receive one tile at a time, do analysis or processing on the tile, and write data and send results to the reduce script.

See the count example processor's map script

'reduce' script

The reduce script serves both to initialize TileReduce with job options, and to handle reducing results returned by the map script for each tile.

See the count example processor's reduce script

Options

Basic Options

zoom

zoom specifies the zoom level of tiles to retrieve from each source.

tilereduce({
	zoom: 15,
	// ...
})

map

Path to the map script, which will be executed against each tile

tilereduce({
	map: path.join(__dirname, 'map.js')
	// ...
})

maxWorkers

By default, TileReduce creates one worker process per CPU. maxWorkers may be used to limit the number of workers created

tilereduce({
  maxWorkers: 3,
  // ...
})

output

By default, any data written from workers is piped to process.stdout on the main process. You can pipe to an alternative writable stream using the output option.

tilereduce({
	output: fs.createWriteStream('output-file'),
	// ...
})

log

Disables logging and progress output

tilereduce({
	log: false,
	// ...
})

mapOptions

Passes through arbitrary options to workers. Options are made available to map scripts as global.mapOptions

tilereduce({
	mapOptions: {
		bufferSize: 4
	}
	// ...
})

Specifying Sources

Sources are specified as an array in the sources option:

tilereduce({
	sources: [
		/* source objects */
	],
	// ...
})

MBTiles

sources: [
  {
    name: 'osmdata',
    mbtiles: __dirname+'/latest.planet.mbtiles',
    layers: ['osm']
  }
]

MBTiles work well for optimizing tasks that request many tiles, since the data is stored on disk. Create your own MBTiles from vector data using tippecanoe, or use OSM QA Tiles, a continuously updated MBTiles representation of OpenStreetMap.

URL

Remote Vector Tile sources accessible over HTTP work well for mashups of datasets and datasets that would not be practical to fit on a single machine. Be aware that HTTP requests are slower than mbtiles, and throttling is typically required to avoid disrupting servers at high tile volumes. maxrate dictates how many requests per second will be made to each remote source.

sources: [
  {
    name: 'streets',
    url: 'https://b.tiles.mapbox.com/v4/mapbox.mapbox-streets-v6/{z}/{x}/{y}.vector.pbf',
    layers: ['roads'],
    maxrate: 10
  }
]

raw

By default, sources will be automatically converted from their raw Vector Tile representation to GeoJSON. If you set raw: true in an MBTiles or URL source, the raw Vector Tile data will be provided, allowing you to lazily parse features as needed. This is useful in some situations for maximizing performance.

sources: [
  {
    name: 'streets',
    url: 'https://b.tiles.mapbox.com/v4/mapbox.mapbox-streets-v6/{z}/{x}/{y}.vector.pbf',
    raw: true
  }
]

Specifying Job Area

Jobs run over a geographic region represented by a set of tiles. TileReduce also accepts several area definitions that will be automatically converted into tiles.

BBOX

A valid bounding box array.

tilereduce({
	bbox: [w, s, e, n],
	// ...
})

GeoJSON

A valid GeoJSON geometry of any type.

tilereduce({
	geojson: {"type": "Polygon", "coordinates": [/* coordinates */]},
	// ...
})

Tile Array

An array of quadtiles represented as xyz arrays.

tilereduce({
	tiles: [
		[x, y, z]
	],
	// ...
})

Tile Stream

Tiles can be read from an object mode node stream. Each object in the stream should be either a string in the format x y z or an array in the format [x, y, z].

tilereduce({
	tileStream: /* an object mode node stream */,
	// ...
})

Line separated tile list files can easily be converted into the appropriate object mode streams using binary-split:

var split = require('binary-split'),
	fs = require('fs');

tilereduce({
	tileStream: fs.createReadStream('/path/to/tile-file').pipe(split()),
	// ...
})

Source Cover

When using MBTiles sources, a list of tiles to process can be automatically retrieved from the source metadata

tilereduce({
	sourceCover: 'osmdata',
	sources: [
		{
			name: 'osmdata',
			mbtiles: __dirname+'/latest.planet.mbtiles'
		}
	]
	// ...
})

Events

TileReduce returns an EventEmitter.

start

Fired once all workers are initialized and before the first tiles are sent for processing

tilereduce({/* ... */})
.on('start', function () {
	console.log('starting');
});

map

Fired just before a tile is sent to a worker. Receives the tile and worker number assigned to process the tile.

tilereduce({/* ... */})
.on('map', function (tile, workerId) {
	console.log('about to process ' + JSON.stringify(tile) +' on worker '+workerId);
});

reduce

Fired when a tile has finished processing. Receives data returned in the map function's done callback (if any), and the tile.

var count = 0;
tilereduce({/* ... */})
.on('reduce', function (result, tile) { 
	console.log('got a count of ' + result + ' from ' + JSON.stringify(tile));
	count++;
});

end

Fired when all queued tiles have been processed. Use this event to output final reduce results.

var count = 0;
tilereduce({/* ... */})
.on('end', function () {
	console.log('Total count was: ' + count);
});

Processor Examples

Development

Testing

npm test

Linting

npm run lint

Test Coverage

npm run cover