dataengineering
There are 552 repositories under dataengineering topic.
DataExpert-io/data-engineer-handbook
This is a repo with links to everything you'd ever want to learn about data engineering
open-metadata/OpenMetadata
OpenMetadata is a unified metadata platform for data discovery, data observability, and data governance powered by a central metadata repository, in-depth column level lineage, and seamless team collaboration.
datafold/data-diff
Compare tables within or across databases
TobikoData/sqlmesh
Efficient data transformation and modeling framework that is backwards compatible with dbt.
zinggAI/zingg
Scalable identity resolution, entity resolution, data mastering and deduplication using ML
Datavault-UK/automate-dv
A free to use dbt package for creating and loading Data Vault 2.0 compliant Data Warehouses (powered by dbt, an open source data engineering tool, registered trademark of dbt Labs)
Snowflake-Labs/snowpark-python-demos
This repository provides various demos/examples of using Snowpark for Python.
awslabs/aws-ddk
An open source development framework to help you build data workflows and modern data architecture on AWS.
kevinheavey/modern-polars
Code and data for the Modern Polars book
ErdemOzgen/Data-Engineering-Roadmap
Roadmap for Data Engineering
awslabs/aws-orbit-workbench
A Data Platform built for AWS, powered by Kubernetes.
ogbinar/DataEngineeringPilipinas
Data Engineering Pilipinas is a community for data engineers, data analysts, data scientists, developers, AI / ML engineers, and users of closed and open source data tools and methods / techniques in the Philippines. Data Engineering Pilipinas is a PyData group.
kirilldikalin/myknowledge
Все, о чем меня когда-либо спрашивали на собеседованиях, и другие полезные знания в кратком формате
sparsh-ai/recohut
Recohut - Learn data engineering, data science
Eldar1205/awesome-python-backend
Index for online reading materials in order to learn Python and backend development/engineering concepts from scratch and develop a mastery sufficient for Senior/Principal Backend Engineers and Data Engineers
josephmachado/beginner_de_project_stream
Simple stream processing pipeline
mehd-io/pypi-duck-flow
end-to-end data engineering project to get insights from PyPi using python and duckdb
TirendazAcademy/Awesome-Data-Science-Resources
Resources about data science, machine learning, deep learning, data engineering, and SQL.
Finance-And-ML/US-Stock-Prediction-Using-ML-And-Spark
Predict stock price based on financial news feeds
prodmodel/prodmodel
Build, test, deploy, iterate - Dev and prod tool for data science pipelines
kislerdm/data-engineering-interviews
Data engineering interviews Q&A for data community by data community
minhadona/data_engineer_interview_challenges
Found a data engineering challenge or participated in a selection process ? Share with us!
abhishek-ch/data-machinelearning-the-boring-way
Build & Learn Data Engineering,Machine Learning over Kubernetes. No Shortcut approach.
noahgift/data-engineering-and-dataops
Duke MIDS: Data Engineering and DataOps Course
sbalnojan/run-a-data-team
A guide for leading a data (engineering) team
franloza/coches-net-dashboard
Sample project that use Dagster, dbt, DuckDB and Dash to visualize car and motorcycle Spanish market
olist/work-at-olist-data
Apply for a job at Olist's Data Team: https://olist.gupy.io/
Wittline/apache-spark-docker
Dockerizing an Apache Spark Standalone Cluster
josephmachado/socialetl
Project for "Data pipeline design patterns" blog.
CynthiaKoopman/Forecasting-Solar-Energy
Forecasting Solar Power: Analysis of using a LSTM Neural Network
aakashnand/trino-ranger-demo
Tutorial on how to setup Trino and Apache Ranger using docker
Spratiher9/SparkDataset
Instant search for and access to many datasets in Pyspark.
danielsaban/data-scraping-sofascore
Data Engineering/Scraping Project. Creating a detailed Sports Relational Database for the Top European Soccer Leagues.