/awesome-spark

A curated list of awesome Apache Spark packages and resources.

Creative Commons Zero v1.0 UniversalCC0-1.0

Awesome Spark Awesome


A curated list of awesome Apache Spark packages and resources.

Language bindings

Notebooks and IDEs

General purpose libraries

  • Succinct - Support for efficient queries on compressed data.

SQL Data Sources

Bioinformatics

  • ADAM - a set of tools designed to analyse genomics data.

GIS

  • Magellan - Geospatial Analytics Using Spark.
  • GeoSpark - A Cluster Computing System for Processing. Large-Scale Spatial Data

Time series analytics

  • Spark-Timeseries - A Scala / Java / Python library for interacting with time series data on Apache Spark.

Graph processing

  • Mazerunner - graph analytics platform on top of Neo4j and GraphX.
  • GraphFrames - DataFrame-based graph API.

Machine Learning Extension

Books

MOOCS

Workshops

Projects Using Spark

  • Oryx 2 - a lambda architecture built on Apache Spark and Apache Kafka with specialization for real-time large scale machine learning.
  • PredictionIO - Machine Learning server for developers and data scientists to build and deploy predictive applications in a fraction of the time.

Public Domain Mark
This work (Awesome Spark, by https://github.com/awesome-spark/awesome-spark), identified by Maciej Szymkiewicz, is free of known copyright restrictions.