A curated list of awesome Apache Spark packages and resources.
- Succinct - Support for efficient queries on compressed data.
- Spark CSV - CSV reader and writer.
- Spark Avro - Apache Avro reader and writer.
- Spark XML - XML parser and writer.
- Spark-Mongodb - MongoDB reader and writer.
- Spark Cassandra Connector - Cassandra support including data source and API and support for arbitrary queries.
- Spark Riak Connector - Riak TS & Riak KV connector
- Mongo-Spark - Official MongoDB Spark Connector
- ADAM - a set of tools designed to analyse genomics data.
- Magellan - Geospatial Analytics Using Spark.
- GeoSpark - A Cluster Computing System for Processing. Large-Scale Spatial Data
- Spark-Timeseries - A Scala / Java / Python library for interacting with time series data on Apache Spark.
- Mazerunner - graph analytics platform on top of Neo4j and GraphX.
- GraphFrames -
DataFrame
-based graph API.
- dbscan-on-spark - An Implementation of the DBSCAN clustering algorithm on top of Apache Spark by irvingc and based on the paper from He, Yaobin, et al. MR-DBSCAN: a scalable MapReduce-based DBSCAN algorithm for heavily skewed data.
- spark_dbscan - Another Implementation of the DBSCAN clustering algorithm on top of Apache Spark by alitouka.
- Apache SystemML - declarative machine learning framework on top of Spark.
- Mahout Spark Bindings - linear algebra DSL and optimizer with R-like syntax.
- spark-sklearn - scikit-learn integration with distributed model training.
- KeystoneML - type safe machine learning pipelines with RDDs.
- Mastering Apache Spark.
- Learning Spark, Lightning-Fast Big Data Analysis.
- Advanced Analytics with Spark.
- Oryx 2 - a lambda architecture built on Apache Spark and Apache Kafka with specialization for real-time large scale machine learning.
- PredictionIO - Machine Learning server for developers and data scientists to build and deploy predictive applications in a fraction of the time.
This work (Awesome Spark, by https://github.com/awesome-spark/awesome-spark), identified by Maciej Szymkiewicz, is free of known copyright restrictions.