Pinned Repositories
cascading
Cascading is a feature rich API for defining and executing complex and fault tolerant data processing flows locally or on a cluster.
circus-train
Circus Train is a dataset replication tool that copies Hive tables between clusters and clouds.
waggle-dance
Hive federation service. Enables disparate tables to be concurrently accessed across multiple Hive deployments.
aws-glue-data-catalog-client-for-apache-hive-metastore
The AWS Glue Data Catalog is a fully managed, Apache Hive Metastore compatible, metadata repository. Customers can use the Data Catalog as a central repository to store structural and operational metadata for their data. AWS Glue provides out-of-box integration with Amazon EMR that enables customers to use the AWS Glue Data Catalog as an external Hive Metastore. This is an open-source implementation of the Apache Hive Metastore client on Amazon EMR clusters that uses the AWS Glue Data Catalog as an external Hive Metastore. It serves as a reference implementation for building a Hive Metastore-compatible client that connects to the AWS Glue Data Catalog. It may be ported to other Hive Metastore-compatible platforms such as other Hadoop and Apache Spark distributions
cascading
Cascading is a feature rich API for defining and executing complex and fault tolerant data processing flows on a Hadoop cluster. See https://github.com/Cascading/cascading for the release repository.
corc
An ORC File Scheme for the Cascading data processing platform.
GameOfLife
Playing around and learning scala
jdeb
This library provides an Ant task and a Maven plugin to create Debian packages from Java builds in a truly cross platform manner.
maven-sandbox
plunger
A unit testing framework for the Cascading data processing platform.
patduin's Repositories
patduin/GameOfLife
Playing around and learning scala
patduin/aws-glue-data-catalog-client-for-apache-hive-metastore
The AWS Glue Data Catalog is a fully managed, Apache Hive Metastore compatible, metadata repository. Customers can use the Data Catalog as a central repository to store structural and operational metadata for their data. AWS Glue provides out-of-box integration with Amazon EMR that enables customers to use the AWS Glue Data Catalog as an external Hive Metastore. This is an open-source implementation of the Apache Hive Metastore client on Amazon EMR clusters that uses the AWS Glue Data Catalog as an external Hive Metastore. It serves as a reference implementation for building a Hive Metastore-compatible client that connects to the AWS Glue Data Catalog. It may be ported to other Hive Metastore-compatible platforms such as other Hadoop and Apache Spark distributions
patduin/cascading
Cascading is a feature rich API for defining and executing complex and fault tolerant data processing flows on a Hadoop cluster. See https://github.com/Cascading/cascading for the release repository.
patduin/corc
An ORC File Scheme for the Cascading data processing platform.
patduin/jdeb
This library provides an Ant task and a Maven plugin to create Debian packages from Java builds in a truly cross platform manner.
patduin/maven-sandbox
patduin/plunger
A unit testing framework for the Cascading data processing platform.