/succinct

Enabling queries on compressed data.

Primary LanguageJavaApache License 2.0Apache-2.0

Succinct

Build Status License

Succinct is a data store that enables queries directly on a compressed representation of data. This repository maintains the Java implementations of Succinct's core algorithms, and applications that exploit them, such as a Apache Spark binding for Succinct.

Building Succinct

Succinct is built using Apache Maven. To build Succinct and its component modules, run:

mvn clean package

Succinct-Core

The Succinct-Core module contains Java implementation of Succinct's core algorithms. See a more descriptive description of the core module here.

Dependency Information

Apache Maven

To build your application with Succinct-Core, you can link against this library using Maven by adding the following dependency information to your pom.xml file:

<dependency>
    <groupId>amplab</groupId>
    <artifactId>succinct-core</artifactId>
    <version>0.1.8</version>
</dependency>

Succinct on Apache Spark

We provide Apache Spark and Apache Spark SQL interfaces for Succinct, which expose a compressed, queryable RDD SuccinctRDD, enabling manipulation of unstructured data, and a SuccinctKVRDD for querying semi-structured data (key-value pairs, text and json documents, etc.). We also expose Succinct as a DataSource in Apache Spark SQL as an experimental feature. More details on the integration with Apache Spark can be found here.

Dependency Information

Apache Maven

To build your application to run with Succinct on Apache Spark, you can link against this library using Apache Maven by adding the following dependency information to your pom.xml file:

<dependency>
    <groupId>amplab</groupId>
    <artifactId>succinct-spark</artifactId>
    <version>0.1.8</version>
</dependency>

SBT and Spark-Packages

Add the dependency to your SBT project by adding the following to build.sbt (see the Spark Packages listing for spark-submit and Maven instructions):

resolvers += "Spark Packages Repo" at "http://dl.bintray.com/spark-packages/maven"
libraryDependencies += "amplab" % "succinct" % "0.1.8"

The succinct-spark jar file can also be added to a Spark shell using the --jars command line option. For example, to include it when starting the spark shell:

$ bin/spark-shell --jars succinct-0.1.8.jar