Pinned Repositories
armeria
Asynchronous RPC/REST library built on top of Java 8, Netty, HTTP/2, Thrift and gRPC
beam
Mirror of Apache Beam
DataflowPythonSDK
Google Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processing pipelines.
elephant-bird
Twitter's collection of LZO and Protocol Buffer-related Hadoop, Pig, Hive, and HBase code.
hadoop-lzo
Patched, refactored version of code.google.com/hadoop-gpl-compression for hadoop 0.20
hive
Mirror of Apache Hive
kafka
Mirror of Apache Kafka
presto
Distributed SQL query engine for running interactive analytic queries against big data sources.
shaded-protobuf-classes
A tiny project to create shaded Protobuf Java classes suitable for Spark's Protobuf connector
zkclient
a zookeeper client, that makes life a little easier.
rangadi's Repositories
rangadi/shaded-protobuf-classes
A tiny project to create shaded Protobuf Java classes suitable for Spark's Protobuf connector
rangadi/elephant-bird
Twitter's collection of LZO and Protocol Buffer-related Hadoop, Pig, Hive, and HBase code.
rangadi/DataflowPythonSDK
Google Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processing pipelines.
rangadi/hadoop-lzo
Patched, refactored version of code.google.com/hadoop-gpl-compression for hadoop 0.20
rangadi/hive
Mirror of Apache Hive
rangadi/kafka
Mirror of Apache Kafka
rangadi/presto
Distributed SQL query engine for running interactive analytic queries against big data sources.
rangadi/zkclient
a zookeeper client, that makes life a little easier.
rangadi/armeria
Asynchronous RPC/REST library built on top of Java 8, Netty, HTTP/2, Thrift and gRPC
rangadi/beam
Mirror of Apache Beam
rangadi/cascading
Cascading is a feature rich API for defining and executing complex and fault tolerant data processing workflows on a Hadoop cluster.
rangadi/DataflowJavaSDK
Google Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processing pipelines.
rangadi/dataproc-initialization-actions
Run in all nodes of your cluster before the cluster starts - let's you customize your cluster
rangadi/delta
An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
rangadi/flink-dataflow
Google Dataflow Runner for Apache Flink
rangadi/hadoop-common
Mirror of Apache Hadoop common
rangadi/lzo-split
rangadi/misc
rangadi/modeldb
A system to manage machine learning models
rangadi/scio
A Scala API for Google Cloud Dataflow
rangadi/scribe
Scribe is a server for aggregating log data streamed in real time from a large number of servers. It is designed to be scalable, extensible without client-side modification, and robust to failure of the network or any specific machine.
rangadi/snakebite
A pure python HDFS client
rangadi/spark
Apache Spark
rangadi/summingbird
Streaming MapReduce with Scalding and Storm
rangadi/trevni
a column file format
rangadi/zookeeper
Mirror of Apache Hadoop ZooKeeper