Pinned Repositories
airflow-spark-operator-plugin
A plugin to Apache Airflow to allow you to run Spark Submit Commands as an Operator
aws_batch_spark
for AWS Batch ETL Container Base of Spark Envrionment Using ECR or EKS
azure-func-container-app
cdh-edge-docker
Data-Pipelines-with-Apache-Airflow
Developed a data pipeline to automate data warehouse ETL by building custom airflow operators that handle the extraction, transformation, validation and loading of data from S3 -> Redshift -> S3
dbt3
DBT-3 databse benchmark
documents
Slides produced by Engineers and Data Scientists of Blue Yonder
h2o-3
Open Source Fast Scalable Machine Learning API For Smarter Applications (Deep Learning, Gradient Boosting, Random Forest, Generalized Linear Modeling (Logistic Regression, Elastic Net), K-Means, PCA, Stacked Ensembles...)
kafka
kafka streaming project
sangjaerijae.github.io
mkdocs
sangjaerijae's Repositories
sangjaerijae/kafka
kafka streaming project
sangjaerijae/airflow-spark-operator-plugin
A plugin to Apache Airflow to allow you to run Spark Submit Commands as an Operator
sangjaerijae/aws_batch_spark
for AWS Batch ETL Container Base of Spark Envrionment Using ECR or EKS
sangjaerijae/azure-func-container-app
sangjaerijae/cdh-edge-docker
sangjaerijae/Data-Pipelines-with-Apache-Airflow
Developed a data pipeline to automate data warehouse ETL by building custom airflow operators that handle the extraction, transformation, validation and loading of data from S3 -> Redshift -> S3
sangjaerijae/dbt3
DBT-3 databse benchmark
sangjaerijae/documents
Slides produced by Engineers and Data Scientists of Blue Yonder
sangjaerijae/h2o-3
Open Source Fast Scalable Machine Learning API For Smarter Applications (Deep Learning, Gradient Boosting, Random Forest, Generalized Linear Modeling (Logistic Regression, Elastic Net), K-Means, PCA, Stacked Ensembles...)
sangjaerijae/sangjaerijae.github.io
mkdocs
sangjaerijae/fluent-bit-kubernetes-logging
Fluent Bit Kubernetes Daemonset
sangjaerijae/fluent-plugin-graphite
sangjaerijae/Hive-JSON-Serde
Read - Write JSON SerDe for Apache Hive.
sangjaerijae/jdbc_to_gcs_airflow_plugin
This Airflow plugin provides an operator that moves data from DBs to Google Cloud Storage using JdbcHook.
sangjaerijae/kubernetes-the-hard-way
Bootstrap Kubernetes the hard way on Google Cloud Platform. No scripts.
sangjaerijae/liblinear-java
Java version of LIBLINEAR
sangjaerijae/python_etc
python, anaconda, maching-learning, jupyter, pip
sangjaerijae/sampleportal
github start project
sangjaerijae/sarama
Sarama is a Go library for Apache Kafka 0.8, and up.
sangjaerijae/spark-dependencies
Spark job for dependency links
sangjaerijae/spark_etl_json
spark sample etl
sangjaerijae/yanagishima
Web UI for Presto, Hive, Elasticsearch, SparkSQL