/emma-1

Implicit Parallelism for Scalable Data Analysis.

Primary LanguageScalaApache License 2.0Apache-2.0

Emma

A quotation-based Scala DSL for scalable data analysis.

Build Status

Goals

Our goal is to improve developer productivity by hiding parallelism aspects behind a high-level, declarative API which maximises reuse of native Scala syntax and constructs.

Emma supports state-of-the-art dataflow engines such as Apache Flink and Apache Spark as backend co-processors.

Features

DSLs for scalable data analysis are embedded through types. In contrast, Emma is based on quotations (similar to Quill). This approach has two benefits.

First, it allows to reuse Scala-native, declarative constructs in the DSL. Quoted Scala syntax such as for-comprehensions, case-classes, and pattern matching are thereby lifted to an intermediate representation called Emma Core.

Second, it allows to analyze and optimize Emma Core terms holistically. Subterms of type DataBag[A] are thereby transformed and off-loaded to a parallel dataflow engine such as Apache Flink or Apache Spark.

Examples

The emma-examples module contains examples from various fields.

Learn More

Check emma-language.org for further information.

Build

  • JDK 7+ (preferably JDK 8)
  • Maven 3

Run

mvn clean package -DskipTests

to build Emma without running any tests.

For more advanced build options including integration tests for the target runtimes please see the "Building Emma" section in the Wiki.