This repository is DEPRECATED! Please use the new Sparkling Water repository https://github.com/h2oai/sparkling-water!
Makes interoperability between H2O and Spark trivial.
- Spark 1.0.0 (SQL component required)
- Tachyon 0.4.1
- Java 1.6+
- First compile latest version of spark with SQL component
git clone spark
cd spark
sbt/sbt assembly publish-local
-
For Tachyon support please download Tachyon 0.4.1 from https://github.com/amplab/tachyon/releases/tag/v0.4.1
-
Compile sparkling demo
cd h2o-sparkling-demo
sbt assembly
Note: The assembly stage is important, since the demo is a Spark driver sending a jar-file containing implementation of a working job.
For this run no Spark cloud is required:
- Execute an instance of H2O embedding Spark driver
cd h2o-sparkling-demo
sbt "run --local"
For this run a Spark cloud is required:
- Run master and one worker on local node
cd spark/sbin
./start-master.sh
./start-slave.sh 1 "spark://localhost:7077"
- Assembly h2o-sparkling-demo jar file which can be sent by the driver to Spark cloud
cd h2o-sparkling-demo
sbt assembly
sbt "run --remote"
cd h2o-sparkling-demo
sbt runH2O
Currently demo supports three extractors:
- dummy - pull all data into driver and create a frame
- file - ask Spark to save RDD as a file on local filesystem and then parse a stored file
- tachyon - ask Spark to save RDD to tachyon filesystem, then H2O load a file from tachyon FS
The extractor can be selected via --extractor
command line parameter, e.g., --extractor==tachyon
- Start Tachyon
cd tachyon/bin
./tachyon-start.sh
-
Look at http://localhost:19999/ to see list of files stored on the storage or type tfs command
tachyon tfs ls /
-
For more info details discuss instructions instructions on http://tachyon-project.org/Running-Spark-on-Tachyon.html
Run a demo with Tachyon-based extractor againts remote Spark cloud:
cd h2o-sparkling-demo
sbt assembly
sbt "run --remote --extractor=tachyon"
Run airlines demo with file-based extractor againts remote Spark cloud running on non-default location:
sbt "run --remote --sparkMaster=spark://localhost:17077 --noshutdown --demo=airlines --extractor=file"
- Matei Zaharia' slides about Spark