cloudcruncher's Stars
raystack/optimus
Optimus is an easy-to-use, reliable, and performant workflow orchestrator for data transformation, data modeling, pipelines, and data quality management.
manuzhang/awesome-streaming
a curated list of awesome streaming frameworks, applications, etc
abhishek-ch/around-dataengineering
A Data Engineering & Machine Learning Knowledge Hub
manoj9788/spark-etl-tests
A sample repository showcasing, implementation of testing for ETL pipeline developed with Apache Spark
vmware/versatile-data-kit
One framework to develop, deploy and operate data workflows with Python and SQL.
damklis/DataEngineeringProject
Example end to end data engineering project.
Rishav273/kafkaPysparkAnalytics
Real-time ETL pipeline for financial data (kafka, pyspark) .
benchsci/tinsel
PySpark schema generator
DataTalksClub/data-engineering-zoomcamp
Free Data Engineering course!
piskvorky/smart_open
Utils for streaming large files (S3, HDFS, gzip, bz2...)
GoogleCloudDataproc/initialization-actions
Run in all nodes of your cluster before the cluster starts - lets you customize your cluster
san089/Udacity-Data-Engineering-Projects
Few projects related to Data Engineering including Data Modeling, Infrastructure setup on cloud, Data Warehousing and Data Lake development.
GoogleCloudPlatform/professional-services
Common solutions and tools developed by Google Cloud's Professional Services team. This repository and its contents are not an officially supported Google product.
GoogleCloudPlatform/training-data-analyst
Labs and demos for courses for GCP Training (http://cloud.google.com/training).