Pinned Repositories
airflow
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
gobblin
A distributed data integration framework that simplifies common aspects of big data integration such as data ingestion, replication, organization and lifecycle management for both streaming and batch data ecosystems.
airflow
Apache Airflow
bdutil
dataproc-initialization-actions
Run in all nodes of your cluster before the cluster starts - lets you customize your cluster
incubator-gobblin
Gobblin is a distributed big data integration framework (ingestion, replication, compliance, retention) for batch and streaming systems. Gobblin features integrations with Apache Hadoop, Apache Kafka, Salesforce, S3, MySQL, Google etc.
bdutil
[DEPRECATED] Script used to manage Hadoop and Spark instances on Google Compute Engine
cloud-dataproc
Cloud Dataproc: Samples and Utils
initialization-actions
Run in all nodes of your cluster before the cluster starts - lets you customize your cluster
DanSedov's Repositories
DanSedov/airflow
Apache Airflow
DanSedov/bdutil
DanSedov/dataproc-initialization-actions
Run in all nodes of your cluster before the cluster starts - lets you customize your cluster
DanSedov/incubator-gobblin
Gobblin is a distributed big data integration framework (ingestion, replication, compliance, retention) for batch and streaming systems. Gobblin features integrations with Apache Hadoop, Apache Kafka, Salesforce, S3, MySQL, Google etc.