Emma

A quotation-based Scala DSL for scalable data analysis.

Goals

Our goal is to improve developer productivity by hiding parallelism aspects behind a high-level, declarative API which maximises reuse of native Scala syntax and constructs.

Emma supports state-of-the-art dataflow engines such as Apache Flink and Apache Spark as backend co-processors.

Features

DSLs for scalable data analysis are embedded through types. In contrast, Emma is based on quotations (similar to Quill). This approach has two benefits.

First, it allows to reuse Scala-native, declarative constructs in the DSL. Quoted Scala syntax such as for-comprehensions, case-classes, and pattern matching are thereby lifted to an intermediate representation called Emma Core.

Second, it allows to analyze and optimize Emma Core terms holistically. Subterms of type DataBag[A] are thereby transformed and off-loaded to a parallel dataflow engine such as Apache Flink or Apache Spark.

Examples

The emma-examples module contains examples from various fields.

Graph Analysis
Supervised Learning
- Naive Bayses Classification
Unsupervised Learning
- k-Means Clustering
Text Processing
- Word Count

Learn More

Check emma-language.org for further information.

Build

JDK 7+ (preferably JDK 8)
Maven 3

Run

mvn clean package -DskipTests

to build Emma without running any tests.