data-processing

Pinned Repositories

incubator-samza
Mirror of Apache Samza
Language:Scala0 1 00
kafka-embedded
Runs embedded, in-memory Apache Kafka instances. Helpful for integration testing.
Language:Scala0 1 013
kafka-manager
A tool for managing Apache Kafka.
Language:Scala00
kafka-spark-consumer
Language:Java0 1 00
kafka-storm-starter
Code examples that show to integrate Apache Kafka 0.8+ with Apache Storm 0.9+, while using Apache Avro as the data serialization format.
Language:Scala00
kangaroo
Hadoop utilities for Kafka
Language:Java00
klio
Smarter data pipelines for audio.
Language:Python00
mpire
A Python package for easy multiprocessing, but faster than multiprocessing
Language:Python00
Neuraxle
Build neat pipelines with the right abstractions to do AutoML. Let your pipeline steps have hyperparameter spaces. Enable checkpoints to cut duplicate calculations. Go from research to production environment easily.
Language:Python00
rabit
Reliable Allreduce and Broadcast Interface for distributed machine learning
Language:C++00

data-processing's Repositories

data-processing/dpark
Python clone of Spark, a MapReduce alike framework in Python
data-processing/kafka-spark-consumer
data-processing/spindle
Next-generation web analytics processing with Scala, Spark, and Parquet.
data-processing/streamparse
streamparse lets you run Python code against real-time streams of data. Integrates with Apache Storm.
data-processing/fluid
data-processing/cassovary
Cassovary is a simple big graph processing library for the JVM
data-processing/snowplow
Enterprise-strength web and event analytics, powered by Hadoop, Kinesis, Redshift and Postgres
data-processing/druid
Real²time Exploratory Analytics on Large Datasets
data-processing/grill
data-processing/spark-ec2
Scripts used to setup a Spark cluster on EC2
data-processing/kafka-storm-starter
Code examples that show to integrate Apache Kafka 0.8+ with Apache Storm 0.9+, while using Apache Avro as the data serialization format.
data-processing/incubator-samza
Mirror of Apache Samza
data-processing/cdk
Cloudera Development Kit
data-processing/crunch
Crunch is an Apache TLP now, and lives at http://crunch.apache.org/
data-processing/Impatient
source examples to support the "Cascading for the Impatient" blog post series
data-processing/exelixi
Exelixi is a distributed framework based on Apache Mesos, mostly implemented in Python using gevent for high-performance concurrency. It is intended to run cluster computing jobs (partitioned batch jobs, which include some messaging) in pure Python. By default, it runs genetic algorithms at scale.
data-processing/storm-yarn
Storm for Yarn