R Support for Apache Storm Multilang Protocol
Storm is a distributed realtime computation system. Similar to how Hadoop provides a set of general primitives for doing batch processing, Storm provides a set of general primitives for doing realtime computation.
Storm includes a “Multi-Language” (or “Multilang”) Protocol to allow implementation of Bolts and Spouts in languages other than Java. This R extension provides implementations of utility functions to allow an application developer to focus on application-specific functionality rather than Storm/R communications plumbing.
From Storm’s point of view, it creates an R process to consume and produce Tuples. Storm commu- nicates with R using a JSON-like format. Storm writes Tuples via STDIN, and reads Tuples from R via STDOUT. The Storm package implements several functions to take care of Storm/R I/O.
As the application programmer, you implement a single function with signature:v “function(s=Storm, t=Tuple)” that will be called once per Tuple. Inside this function, you can emit zero or more Tuples, as well as emit other status messages, such as failures and diagnostic messages.
To use this extension, briefly:
- create a new Storm object.
- define a function that can process and emit Tuple objects.
- call the run() method on the Storm object.
A detailed example is given in the examples section.
You'll find two directories here:
- java/ - contains a maven project for testing R with a very simple string permutation test. Run it as 'mvn test'. You'll need to have the "Storm" and "permute" R packages installed.
- Storm/ - contains the R Storm package itself. To use it, you'll either want to install from CRAN (e.g. 'install.packages("Storm")'), or by 'R CMD install Storm'.
https://github.com/nathanmarz/storm/wiki/Using-non-JVM-languages-with-Storm https://github.com/nathanmarz/storm/wiki/Multilang-protocol