/limbo

Primary LanguageScalaApache License 2.0Apache-2.0

Limbo

Build Status codecov.io GitHub license

Raison d'être:

Note: Limbo is in a WIP state.

Limbo is a Scala API which allows to leverage best of different data processing frameworks by allowing seamless transition between framework specific data structures.

Features

  • Itegration between Scio and Spark
  • Programmatic Spark job submission to a Apache YARN cluster
  • Scala API for Google Dataproc cluster

Example:

// Start in Scio:
val (sc, args) = ContextAndArgs(argv)
val scol = sc.parallelize(1 to 10)

// Move to Spark realm
scol.toRDD().map { rdd =>
  rdd
    .map(_ * 2)
    .saveAsTextFile(args("output"))
}

Code of conduct

This project adheres to the Open Code of Conduct. By participating, you are expected to honor this code.

License

Copyright 2016 Spotify AB.

Licensed under the Apache License, Version 2.0: http://www.apache.org/licenses/LICENSE-2.0