data-warehouse
There are 717 repositories under data-warehouse topic.
PostHog/posthog
🦔 PostHog provides open-source web & product analytics, session recording, feature flagging and A/B testing that you can self-host. Get started - free.
oxnr/awesome-bigdata
A curated list of awesome big data frameworks, ressources and other awesomeness.
rudderlabs/rudder-server
Privacy and Security focused Segment-alternative, in Golang and React
dlt-hub/dlt
data load tool (dlt) is an open source Python library that makes data loading easy 🛠️
hydradatabase/columnar
Postgres-native columnar storage extension
BlankerL/DXY-COVID-19-Data
2019新型冠状病毒疫情时间序列数据仓库 | COVID-19/2019-nCoV Infection Time Series Data Warehouse
elementary-data/elementary
The dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.
san089/Udacity-Data-Engineering-Projects
Few projects related to Data Engineering including Data Modeling, Infrastructure setup on cloud, Data Warehousing and Data Lake development.
Multiwoven/multiwoven
🔥🔥🔥 Open source Reverse ETL - alternative to hightouch and census.
DataBrewery/cubes
[NOT MAINTAINED] Light-weight Python OLAP framework for multi-dimensional data analysis
BemiHQ/BemiDB
Open-source Snowflake and Fivetran alternative bundled together
tensorbase/tensorbase
TensorBase is a new big data warehousing with modern efforts.
cloudera/hue
Open source SQL Query Assistant service for Databases/Warehouses
GoogleCloudPlatform/bigquery-utils
Useful scripts, udfs, views, and other utilities for migration and data warehouse operations in BigQuery.
scratchdata/scratchdata
Scratch is a swiss army knife for big data.
apache/cloudberry
One advanced and mature open-source MPP (Massively Parallel Processing) database. Open source alternative to Greenplum Database.
alanchn31/Data-Engineering-Projects
Personal Data Engineering Projects
raystack/optimus
Optimus is an easy-to-use, reliable, and performant workflow orchestrator for data transformation, data modeling, pipelines, and data quality management.
unytics/bigfunctions
Supercharge BigQuery with BigFunctions
Canner/vulcan-sql
Data API Framework for AI Agents and Data Apps
pixelsdb/pixels
An efficient storage and compute engine for both on-prem and cloud-native data analytics.
domainmod/domainmod
DomainMOD is an open source application written in PHP & MySQL used to manage your domains and other internet assets in a central location. DomainMOD also includes a Data Warehouse framework that allows you to import your web server data so that you can view, export, and report on your live data.
Titan-Systems/titan
Titan Core - Snowflake infrastructure-as-code. Provision environments, automate deploys, CI/CD. Manage RBAC, users, roles, and data access. Declarative Python Resource API. Change Management tool for the Snowflake data warehouse.
vmware/versatile-data-kit
One framework to develop, deploy and operate data workflows with Python and SQL.
Canner/wren-engine
🤖 The Semantic Engine for Model Context Protocol(MCP) Clients and AI Agents 🔥
pracdata/awesome-open-source-data-engineering
A curated list of open source tools used in analytics platforms and data engineering ecosystem
DataWithBaraa/sql-data-warehouse-project
A comprehensive guide to building a modern data warehouse with SQL Server, including ETL processes, data modeling, and analytics.
tuva-health/tuva
Main repo including core data model, data marts, data quality tests, and terminology sets.
intermine/intermine
A powerful open source data warehouse system
ubisoft/mobydq
:whale: Tool to automate data quality checks on data pipelines
GokuMohandas/data-engineering
Construct a modern data stack and orchestration the workflows to create high quality data for analytics and ML applications.
unytics/airbyte_serverless
Airbyte made simple (no UI, no database, no cluster)
data-engineering-community/data-engineering-project-template
This is a template you can use for your next data engineering portfolio project.
dalenewman/Transformalize
Configurable Extract, Transform, and Load
google/space
Unified storage framework for the entire machine learning lifecycle
iam-mhaseeb/Skytrax-Data-Warehouse
A full data warehouse infrastructure with ETL pipelines running inside docker on Apache Airflow for data orchestration, AWS Redshift for cloud data warehouse and Metabase to serve the needs of data visualizations such as analytical dashboards.