datawarehouse

There are 444 repositories under datawarehouse topic.

  • Tilda

    Language:Java13
  • fabricks

    Language:Python5
  • system-big-data-movies-fr

    Language:Jupyter Notebook5
  • cortana-intelligence-customer360

    This repository contains instructions and code to deploy a customer 360 profile solution on Azure stack using the Cortana Intelligence Suite.

    Language:Python24
  • Datawarehouse

    Fully dockerized Data Warehouse (DWH) using Airflow, dbt, PostgreSQL and dashboard using redash

    Language:Jupyter Notebook23
  • RStoolKit

    RStoolKit

    RStoolKit - A utility to perform a complete health check of your AWS RedShift Cluster

    Language:PLSQL23
  • ETL-Project

    The goal of this project is to illustrate Extract Transform Load (ETL) using Python and SQL. ETL is a process commonly done in computing, which takes raw data, cleans it and stores it for later use. The extraction phase targets and retrieves the data. Transform manipulates and cleans the data. Then load stores the data, typically in a data warehouse.

    Language:Jupyter Notebook21
  • intelli-swift-core

    Distributed, Column-oriented storage, Realtime analysis, High performance Database

    Language:Java18
  • IMDB-DB-Dump-Projects

    Taking IMDBs database dumps and turning them into a multiple projects

    Language:TSQL18
  • DDO

    A DBT package to perform DataOps & administrative CI/CD on your data warehouse.

  • data-brewery

    Data Brewery is an ETL (Extract-Transform-Load) program that connect to many data sources (cloud services, databases, ...) and manage data warehouse workflow.

    Language:Scala16
  • cobra-policytool

    Manage Apache Atlas and Ranger configuration for your Hadoop environment.

    Language:Python16
  • nifi-postgres-metabase

    Template for creating batch based ETL workflow for datawarehouses

    Language:PLpgSQL15
  • SparkETL

    Implement a complete data warehouse etl using spark SQL

    Language:Java14
  • Sentiment-analysis-from-MLOps-paradigm

    This project promulgates an automated end-to-end ML pipeline that trains a biLSTM network for sentiment analysis, experiment tracking, benchmarking by model testing and evaluation, model transitioning to production followed by deployment into cloud instance via CI/CD

    Language:Python13
  • AmazonMoviesDataWarehouse

    数据仓库--存储并分析亚马逊历年电影数据

    Language:Java13
  • data_ai_for_all

    Data Analysis, Analytics, Science, AI & ML, LLM etc.

    Language:Jupyter Notebook13
  • hephaestus

    :stars: Hephaestus - ETL and ML tools for OHDSI - OMOP CDM

    Language:Python13
  • MUST_HAVE_SKILLS

    This repo consists of all important concepts for data engineers.

    Language:Java11
  • Data-Modeling-with-Postgres

    A project to design a fact and dimension star schema for optimizing queries on a flight booking database using PostgreSQL, a relational database management system. This schema is well-suited for a flight booking database, as it allows for efficient querying of data such as booking dates, flight routes, and passenger information.

    Language:PLpgSQL10
  • Data-Warehouse-UKAccident

    Information system for business project - building and mining data warehouse

    Language:TSQL9
  • hexbase

    open-source ETL pipeline for HEX cryptocurrency data

    Language:Python9
  • vau

    Data Vault data model and ETL generator for Oracle Databases

    Language:Java9
  • Data-Warehouse-With-Redshift

    Data Warehouse with AWS Redshift and Visualizing data using Power BI

    Language:Jupyter Notebook8
  • DataManager

    Better organize data in data lake and build ETL pipeline with Web UI tool.

    Language:JavaScript8
  • GitHub-FabricDWDBProject

    About Template to perform CI/CD for Microsoft Fabric Data Warehouses using GitHub Actions

    Language:TSQL7
  • Modern-Big-Data-Analysis-using-SQL

    RDBMS techniques for Big Data analysis

  • DateAndTimeDimensionBuilders

    Data warehousing date dimension and time dimension builders written in Python.

    Language:Python7
  • fabric-accelerator

    Accelerator to build a Microsoft Fabric modern data platform using ELT Framework https://github.com/bennyaustin/elt-framework

    Language:Python6
  • Data-Engineering-Project

    The Centralized Data Warehouse and ML Solution for Banking Analytics is a project that combines a centralized repository for banking data with machine learning algorithms to enable predictive analysis.

    Language:Jupyter Notebook6
  • CourseShop_DataWarehouse

    a data warehouse for an online course shop

    Language:TSQL6
  • data-warehouse

    Practice leaning data warehouse

    Language:TSQL6
  • Business-Intelligence-on-Big-Data-_-U-TAD-2017-Big-Data-Master-Final-Project

    This is the final project I had to do to finish my Big Data Expert Program in U-TAD in September 2017. It uses the following technologies: Apache Spark v2.2.0, Python v2.7.3, Jupyter Notebook (PySpark), HDFS, Hive, Cloudera Impala, Cloudera HUE and Tableau.

    Language:Jupyter Notebook6
  • ETL-Data-Pipeline-using-AirFlow

    An ETL Data Pipelines Project that uses AirFlow DAGs to extract employees' data from PostgreSQL Schemas, load it in AWS Data Lake, Transform it with Python script, and Finally load it into SnowFlake Data warehouse using SCD type 2.

    Language:Python5