amuise's Stars
thunderain-project/StreamSQL
Mirror of Apache Spark
cwensel/cascading
Cascading is a feature rich API for defining and executing complex and fault tolerant data processing flows locally or on a cluster.
tribbloid/spookystuff
Scalable query engine for web scrapping/data mashup/acceptance QA, powered by Apache Spark
jmxtrans/jmxtrans
jmxtrans
apache/kafka
Mirror of Apache Kafka
arianpasquali/storm-solr
Storm Solr Integration
flumebase/flumebase
Continuous Streaming SQL Queries for Flume
devopscloudorg/azure-hdp
Scripts to Automate HDP deployment on Windows Azure Virtual Machines(Linux)
rbalamohan/tez-autobuild
A Tez dev-setup for HDP2 sandbox
openworm/OpenWorm
Repository for the main Dockerfile with the OpenWorm software stack and project-wide issues
jrkinley-zz/storm-hbase
A HBase connector for Storm
rjohnsondev/java-libpst
A library to read PST files with java, without need for external libraries.
BenLangmead/bowtie
An ultrafast memory-efficient short read aligner
twitter/summingbird
Streaming MapReduce with Scalding and Storm
edwardcapriolo/filecrush
Remedy small files by combining them into larger ones.
apache/falcon
Mirror of Apache Falcon
blackberry/hadoop-logdriver
A logdriver for Apache Hadoop
spotify/hadoop-openpgp-codec
Codec for Hadoop adding OpenPGP encryption using Bouncy Castle
Esri/spatial-framework-for-hadoop
The Spatial Framework for Hadoop allows developers and data scientists to use the Hadoop data processing system for spatial data analysis.
Esri/gis-tools-for-hadoop
The GIS Tools for Hadoop are a collection of GIS tools for spatial analysis of big data.
LinkedInAttic/datafu
Hadoop library for large-scale data processing, now an Apache Incubator project
msukmanowsky/OmnitureTextLoader
An Apache Pig UDF for reading and parsing content from raw Omniture log data.
apache/hive
Apache Hive
ptwobrussell/Mining-the-Social-Web
The official online compendium for Mining the Social Web (O'Reilly, 2011)
sriksun/Ivory
Data Management + Feed Processing Platform over Hadoop
nathanmarz/storm
Distributed and fault-tolerant realtime computation: stream processing, continuous computation, distributed RPC, and more
twitter/elephant-bird
Twitter's collection of LZO and Protocol Buffer-related Hadoop, Pig, Hive, and HBase code.
linkedin/databus
Source-agnostic distributed change data capture system
brianfrankcooper/YCSB
Yahoo! Cloud Serving Benchmark