/spark-kernel

A kernel that enables applications to interact with Apache Spark.

Primary LanguageScalaApache License 2.0Apache-2.0

Build Status License Join the chat at https://gitter.im/ibm-et/spark-kernel

Spark Kernel

The main goal of the Spark Kernel is to provide the foundation for interactive applications to connect to and use Apache Spark.

Overview

The Spark Kernel provides an interface that allows clients to interact with a Spark Cluster. Clients can send libraries and snippets of code that are interpreted and ran against a preconfigured Spark context. These snippets can do a variety of things:

  1. Define and run spark jobs of all kinds
  2. Collect results from spark and push them to the client
  3. Load necessary dependencies for the running code
  4. Start and monitor a stream
  5. ...

The kernel's main supported language is Scala, but it is also capable of processing both Python and R. It implements the latest Jupyter message protocol (5.0), so it can easily plug into the 3.x branch of Jupyter/IPython for quick, interactive data exploration.

Try It

A version of the Spark Kernel is deployed as part of the Try Jupyter! site. Select Scala 2.10.4 (Spark 1.4.1) under the New dropdown. Note that this version only supports Scala.

Develop

Vagrant is used to simplify the development experience. It is the only requirement to be able to build and test the Spark Kernel on your development machine.

To interact with the Spark Kernel using Jupyter, run

make dev

This will start a Jupyter notebook server accessible at http://192.168.44.44:8888. From here you can create notebooks that use the Spark Kernel configured for local mode.

Build & Package

To build and package up the Spark Kernel, run

make build

The resulting package of the kernel will be located at ./kernel/target/pack. It contains a Makefile that can be used to install the Spark Kernel by running make install within the directory. More details about building and packaging can be found here.

Version

Our goal is to keep master up to date with the latest version of Spark. When new versions of Spark require code changes, we create a separate branch. The table below shows what is available now.

Branch Spark Kernel Version Apache Spark Version
master 0.1.5 1.5.1
branch-0.1.4 0.1.4 1.4.1
branch-0.1.3 0.1.3 1.3.1

Please note that for the most part, new features to Spark Kernel will only be added to the master branch.

Resources

There is more detailed information available in our Wiki and our Getting Started guide.