data-engineering-pipeline
There are 208 repositories under data-engineering-pipeline topic.
san089/Udacity-Data-Engineering-Projects
Few projects related to Data Engineering including Data Modeling, Infrastructure setup on cloud, Data Warehousing and Data Lake development.
san089/goodreads_etl_pipeline
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
vmware/versatile-data-kit
One framework to develop, deploy and operate data workflows with Python and SQL.
alanchn31/Movalytics-Data-Warehouse
Data pipeline performing ETL to AWS Redshift using Spark, orchestrated with Apache Airflow
AhmetFurkanDEMIR/Data-Engineering-Project-with-HDFS-and-Kafka
Data Engineering Project with Hadoop HDFS and Kafka
anna-geller/dataflow-ops
Project demonstrating how to automate Prefect 2.0 deployments to AWS ECS Fargate
anna-geller/prefect-deployment-patterns
Code examples showing flow deployment to various types of infrastructure
immu0001/Udacity-Data-Engineer-nanodegree
Classwork projects and home works done through Udacity data engineering nano degree
anki-code/xontrib-pipeliner
Let your pipe lines flow thru the Python code in xonsh.
anna-geller/prefect-aws-lambda
Deploy a Prefect flow to serverless AWS Lambda function
Framebuffers/Direwolf
Distributed Data Processing Pipeline for MCP.
mikeroyal/Apache-Spark-Guide
Apache Spark Guide
kishlayjeet/Stock-Market-Real-Time-Data-Pipeline-with-Apache-Kafka-and-Cassandra
A end-to-end real-time stock market data pipeline with Python, AWS EC2, Apache Kafka, and Cassandra Data is processed on AWS EC2 with Apache Kafka and stored in a local Cassandra database.
InosRahul/f1-data-pipeline
F1 Data Pipeline
longNguyen010203/Youtube-Recommend-Master-ETL-Pipeline
A Data Engineering Project that implements an ETL data pipeline using Dagster, Apache Spark, Streamlit, MinIO, Metabase, Dbt, Polars, Docker. Data from kaggle and youtube-api
VeraZab/nyc-stats
Analysis of 311 Service Requests for the City of NYC (from 2010 to 2023) Tech: Prefect cloud, dbt core, BigQuery, Compute Engine, CloudRun, Artifact Registry, Terraform, Docker
DarkStarStrix/DataVolt
Reusable data engineering toolkit My personal data infrastructure
san089/data-engineer-roadmap
Learning from multiple companies in Silicon Valley. Netflix, Facebook, Google, Startups
alero-awani/batch-data-engineering-project
A batch Data Pipeline that retrieves data from a user purchase table and a movie review table and is transformed to form a user behaviour metric table.
nakuleshj/news-nlp-pipeline
A fully serverless, event-driven data pipeline that ingests, enriches, validates, and visualizes real-time news data using AWS services. Designed for cost-efficient, scalable deployment using only free-tier AWS services.
NitinDatta8/realtime-data-streaming
End-to-end data engineering pipeline with various technologies to ingest real time data.
sanjeevai/disaster-response-pipeline
ETL pipeline combined with supervised learning and grid search to classify text messages sent during a disaster event
brunocampos01/predicting-retail-churn-with-azure-ml-studio
Challenge to job: Data Scientist
dylanzenner/business_closures_de_pipeline
Data Engineering pipeline hosted entirely in the AWS ecosystem utilizing DocumentDB as the database
kishlayjeet/Twitter-Data-Pipeline-using-Airflow-and-AWS-S3
An end-to-end Twitter Data Pipeline that extracts data from Twitter and loads it into AWS S3.
minhky2185/healthcare_data_pipeline
An end-to-end data pipeline for building Data Lake and supporting report using Apache Spark.
ketgo/marshmallow-pyspark
Marshmallow serializer integration with pyspark
siddharth271101/Covid-19-and-Aviation-Industry
The goal of this project is to analyse the impact of Covid-19 on the Aviation industry through data engineering processes using technologies such as Apache Airflow, Apache Spark, Tableau and couple of AWS services
kkrusere/NHANES-pyTOOL-API
The NHANES Data 'API' is a Python tool that simplifies access to the National Health and Nutrition Examination Survey (NHANES) dataset. This project provides an easy-to-use API to retrieve NHANES data, helping researchers, data scientists, health professionals, and other stakeholders access these valuable datasets.
antimoz-om/Antimoz
A data engineering pipeline for digital marketers.
shankarlohar/dbt-snowflake-data-pipeline
🚀 A structured data pipeline project using dbt and Snowflake to transform raw data into curated datasets. This project covers data ingestion, cleansing, enrichment, Slowly Changing Dimensions (SCD Type 2), and analytical modeling to derive business insights.
AlphanAksoyoglu/tweeter-etl-pipeline
A streaming ETL pipeline for Realtime Tweet Collection, Analysis and Reporting
benedekrozemberczki/AV_Ultimate_Student_Hunt
Solution for the Ultimate Student Hunt Challenge (1st place).
datarootsio/notion-dbs-data-quality
Using Great Expectations and Notion's API, this repo aims to provide data quality for our databases in Notion.
koksang/social-media-analysis
Social Media Analysis, scalable solution, flexible deployment that analyses social media contents
anna-geller/prefect-getting-started
Get started with Prefect by scheduling your Prefect flows with GitHub Actions