datawarehouse
There are 429 repositories under datawarehouse topic.
DataLinkDC/dinky
Dinky is a real-time data development platform based on Apache Flink, enabling agile data development, deployment and operation.
hydradatabase/hydra
Hydra: Column-oriented Postgres. Add scalable analytics to your project in minutes.
getdozer/dozer
Dozer is a real-time data movement tool that leverages CDC from various sources and moves data into various sinks.
simbafl/DataWarehouse
从数据仓库到用户画像,从数据建设到数据应用
Datavault-UK/automate-dv
A free to use dbt package for creating and loading Data Vault 2.0 compliant Data Warehouses (powered by dbt, an open source data engineering tool, registered trademark of dbt Labs)
shunfei/indexr
An open-source columnar data format designed for fast & realtime analytic with big data.
rdagumampan/yuniql
Free and open source schema versioning and database migration made natively with .NET/6. NEW THIS MAY 2022! v1.3.15 released!
cuebook/CueObserve
Timeseries Anomaly detection and Root Cause Analysis on data in SQL data warehouses and databases
dataplane-app/dataplane
Dataplane is an Airflow inspired unified data platform with additional data mesh and RPA capability to automate, schedule and design data pipelines and workflows. Dataplane is written in Golang with a React front end.
ErdemOzgen/Data-Engineering-Roadmap
Roadmap for Data Engineering
jitsucom/bulker
Service for bulk-loading data to databases with automatic schema management (Redshift, Snowflake, BigQuery, ClickHouse, Postgres, MySQL)
josephmachado/simple_dbt_project
Code for dbt tutorial
DragonKingpin/Hydra
Hydra九头龙,保姆级为您打造属于您的造跨平台TB-PB级别专属搜索引擎、专属上帝之眼。Hydra-面向云计算、多任务调度、服务通信、数仓、微服务化、抽象化分布式操作系统——以实现小型爬虫搜索引擎为例。
locnd-172/IBM-Data-Engineer-Specialization-Coursera-Personal-Note-Public
All of my individual learning materials, documents, and notes from the process of getting the Coursera IBM Data Engineer Professional Certificate specialization are stored in this repository.
MohamedHmini/tweetsOLAPing
implementing an end-to-end tweets ETL/Analysis pipeline.
josephmachado/online_store
End to end data engineering project
Matts966/alphasql
AlphaSQL provides Integrated Type and Schema Check and Parallelization for SQL file set mainly for BigQuery
samber/awesome-olap
A curated list of awesome Online Analytical Processing databases, frameworks, ressources and other awesomeness.
data-solution-automation-engine/data-solution-framework
A library for data warehouse and data integration pattern and architecture documentation.
glynnbird/couchwarehouse
Data warehouse for CouchDB
hifxit/dataligo
A library to accelerate ML and ETL pipeline by connecting all data sources
data-solution-automation-engine/virtual-data-warehouse
The Virtual Data Warehouse is a code generation and template management tool. It is part of the data solution automation ecosystem - the 'engine' for data solution automation.
umer7/Data-Warehouse-Concepts-Design-and-Data-Integration
Repo for Data Warehouse Concepts, Design, and Data Integration by University of Colorado System (coursera)(Notes,Assignments, quiz and research papers)
dermatologist/pyomop
Python package for managing OHDSI clinical data models. Includes support for LLM based plain text queries!
data-solution-automation-engine/data-warehouse-automation-metadata-schema
Generic interface exchange format for Data Warehouse Automation and ETL generation.
dbecorp/snowflakecli
A DuckDB-powered command line interface for Snowflake security, governance, operations, and cost optimization.
SharpData/SharpETL
Write ETL using your favorite SQL dialects
kevchant/AzureDevOps-FabricDWDBProject
Template to perform CI/CD for Microsoft Fabric Data Warehouses
chenqingspring/rules-based-modeling-engine
一款基于规则的可视化模型构建引擎。支持指标定义,规则定义,多数据源接入,RESTful API 查询
TauWu/backend_learning_notes
后端学习笔记,本项目存放了一些我阅读有关的技术类的书籍和部分源码阅读的笔记整理。 涉及范围包括后端开发中的计算机学科基础知识、高级语言的基础知识、源码阅读笔记、数据库知识、数据挖掘知识等,同时也会涉及到一些具体生产场景中会遇到的一些实际问题。 :-D
KennethanCeyer/awesome-data-pipeline
Awesome list for datapipeline
varigence/BimlFlex-Community
Community-focused content to supplement working with BimlFlex.
Balajirvp/DE-Zoomcamp
Code/Notes for the Data Engineering Zoomcamp by DataTalksClub
kromozome2003/Snowflake-Json-DataPipeline
Building Json data pipeline within Snowflake using Streams and Tasks
data-solution-automation-engine/DIRECT
DIRECT, the Data Integration Run-time Execution Control Tool, is a data logistics control framework that can be used to monitor, log, audit and control data integration / ETL processes.
judeleonard/Prescriber-ETL-data-pipeline
An End-to-End ETL data pipeline that leverages pyspark parallel processing to process about 25 million rows of data coming from a SaaS application using Apache Airflow as an orchestration tool and various data warehouse technologies and finally using Apache Superset to connect to DWH for generating BI dashboards for weekly reports