data-engineering
There are 3012 repositories under data-engineering topic.
apache/superset
Apache Superset is a Data Visualization and Data Exploration Platform
GokuMohandas/Made-With-ML
Learn how to design, develop, deploy and iterate on production-grade ML applications.
apache/airflow
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
eugeneyan/applied-ml
📚 Papers & tech blogs by companies sharing their work on data science & machine learning in production.
DataTalksClub/data-engineering-zoomcamp
Free Data Engineering course!
PrefectHQ/prefect
Prefect is a workflow orchestration tool empowering developers to build, observe, and react to data pipelines
argoproj/argo-workflows
Workflow Engine for Kubernetes
airbytehq/airbyte
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
andkret/Cookbook
The Data Engineering Cookbook
datastacktv/data-engineer-roadmap
Roadmap to becoming a data engineer in 2021
dagster-io/dagster
An orchestration platform for the development, production, and observation of data assets.
great-expectations/great_expectations
Always know what to expect from your data.
Avaiga/taipy
Turns Data and AI algorithms into production-ready web applications in no time.
xonsh/xonsh
:shell: Python-powered, cross-platform, Unix-gazing shell.
benthosdev/benthos
Fancy stream processing made operationally mundane
mage-ai/mage-ai
🧙 Build, run, and manage data pipelines for integrating and transforming data.
kestra-io/kestra
Infinitely scalable, event-driven, language-agnostic orchestration and scheduling platform to manage millions of workflows declaratively in code.
risingwavelabs/risingwave
Cloud-native SQL stream processing, analytics, and management. KsqlDB and Apache Flink alternative. 🚀 10x more productive. 🚀 10x more cost-efficient.
cloudquery/cloudquery
The open source high performance ELT framework powered by Apache Arrow
growthbook/growthbook
Open Source Feature Flagging and A/B Testing Platform
feast-dev/feast
Feature Store for Machine Learning
treeverse/lakeFS
lakeFS - Data version control for your data lake | Git for data
whoiskatrin/sql-translator
SQL Translator is a tool for converting natural language queries into SQL code using artificial intelligence. This project is 100% free and open source.
aws/aws-sdk-pandas
pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, Neptune, OpenSearch, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
hemansnation/God-Level-Data-Science-ML-Full-Stack
A collection of scientific methods, processes, algorithms, and systems to build stories & models. Whether you are a fresher in the field or an experienced professional who wants to transition into Data Science & AI
ploomber/ploomber
The fastest ⚡️ way to build data pipelines. Develop iteratively, deploy anywhere. ☁️
evidence-dev/evidence
Business intelligence as code: build fast, interactive data visualizations in pure SQL and markdown
memphisdev/memphis
Memphis.dev is a highly scalable and effortless data streaming platform
adilkhash/Data-Engineering-HowTo
A list of useful resources to learn Data Engineering from scratch
phidatahq/phidata
Build AI Assistants with memory, knowledge and tools.
datafold/data-diff
Compare tables within or across databases
Moataz-Elmesmary/Data-Science-Roadmap
Data Science Roadmap from A to Z
GokuMohandas/mlops-course
Learn how to design, develop, deploy and iterate on production-grade ML applications.
quadratichq/quadratic
Quadratic | Data Science Spreadsheet with Python & SQL
apache/incubator-devlake
Apache DevLake is an open-source dev data platform to ingest, analyze, and visualize the fragmented data from DevOps tools, extracting insights for engineering excellence, developer experience, and community growth.
jqnatividad/qsv
CSVs sliced, diced & analyzed.