Saranath's Stars
elastic/elasticsearch
Free and Open Source, Distributed, RESTful Search Engine
nathanmarz/storm
Distributed and fault-tolerant realtime computation: stream processing, continuous computation, distributed RPC, and more
tomwhite/hadoop-book
Example source code accompanying O'Reilly's "Hadoop: The Definitive Guide" by Tom White
Twitter4J/Twitter4J
Twitter4J is an open-source Java library for the Twitter API.
twitter/elephant-bird
Twitter's collection of LZO and Protocol Buffer-related Hadoop, Pig, Hive, and HBase code.
cloudera/flume
WE HAVE MOVED to Apache Incubator. https://cwiki.apache.org/FLUME/ . Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data. It has a simple and flexible architecture based on streaming data flows. It is robust and fault tolerant with tunable reliability mechanisms and many failover and recovery mechanisms. The system is centrally managed and allows for intelligent dynamic management. It uses a simple extensible data model that allows for online analytic applications.
nathanmarz/storm-starter
Learn to use Storm!
nathanmarz/storm-contrib
A collection of spouts, bolts, serializers, DSLs, and other goodies to use with Storm
commoncrawl/commoncrawl
Common Crawl support library to access 2008-2012 crawl archives (ARC files)
cwensel/cascading
Cascading is a feature rich API for defining and executing complex and fault tolerant data processing flows locally or on a cluster.
commoncrawl/commoncrawl-crawler
The Common Crawl Crawler Engine and Related MapReduce code (2008-2012)
alexholmes/hadoop-book
Source code to accompany the book "Hadoop in Practice", published by Manning.
nathanmarz/kafka-deploy
Automated deploy for Kafka on AWS
datasalt/pangool
Tuple MapReduce for Hadoop: Hadoop API made easy
OpenDDRdotORG/OpenDDR-Java
Java Implementation of OpenDDR-Simple-API
storm-book/examples-ch06-real-life-app
A Storm Based DRPC Search Engine
square/cascading2-protobufs
Cascading 2 library for working with Protocol Buffers (Scheme, Serialization, and maybe even some functions/filters)
nathanmarz/trident-kafka
NOTE: This project has been moved into storm-kafka in storm-contrib
inadco/HBase-Lattice
stealth mode at this point