/skale-engine

High performance distributed data processing engine

Primary LanguageJavaScriptApache License 2.0Apache-2.0

skale-engine

Join the chat at https://gitter.im/skale-me/skale-engine Build Status

High performance distributed data processing engine

Skale-engine is a fast and general purpose distributed data processing system. It provides a high-level API in Javascript and an optimized parallel execution engine on top of NodeJS.

Word count using skale:

var sc = require('skale-engine').context();

sc.textFile('/path/...')
  .flatMap(line => line.split(' '))
  .map(word => [word, 1])
  .reduceByKey((a, b) => a + b, 0)
  .count().then(console.log);

Features

  • In-memory computing
  • Controlled memory usage, spill to disk when necessary
  • Fast multiple distributed streams
  • realtime lazy compiling and running of execution graphs
  • workers can connect through TCP or websockets
  • very fast, see benchmark

Docs & community

Quickstart

The best and quickest way to get started with skale-engine is to use skale to create, run and deploy skale applications.

$ sudo npm install -g skale  # Install skale command once and for all
$ skale create my_app        # Create a new app, install skale-engine
$ cd my_app
$ skale run                  # Starts a local cluster if necessary and run

Examples

In the following, we bypass skale toolbelt, and use directly and only skale-engine. It's for you if you are rather more interested by the skale-engine architecture, details and internals.

To run the internal examples, clone the skale-engine repository and install the dependencies:

$ git clone git://github.com/skale-me/skale-engine.git --depth 1
$ cd skale-engine
$ npm install

Then start a skale-engine server and workers on local host:

$ npm start

Then run whichever example you want

$ ./examples/wordcount.js /etc/hosts

Tests

To run the test suite, first install the dependencies, then run npm test:

$ npm install
$ npm test

People

The original authors of skale-engine are Cedric Artigue and Marc Vertes.

List of all contributors

License

Apache-2.0