/data-warehousing

Data warehouse tech stack with PostgreSQL, DBT and Airflow

Primary LanguagePythonMIT LicenseMIT

Data warehouse tech stack with PostgreSQL, DBT and Airflow

Forks Badge Pull Requests Badge Issues Badge GitHub contributors License Badge

Data Engineering Sensor Data

A fully dockerized ELT pipeline project, using PostgreSQL, dbt, Apache Airflow, and Redash.
Explore the docs »

Articles

Table of Contents

Project Structure

images:

  • images/ the folder where all snapshot for the project are stored.

logs:

  • logs/ the folder where script logs are stored.

data:

  • *.csv.dvc the folder where the dataset versioned csv files are stored.

.dvc:

  • .dvc/: the folder where dvc is configured for data version control.

.github:

  • .github/: the folder where github actions and CML workflow is integrated.

models:

  • models: the folder where DBT model queries are stored.

notebooks:

  • eda.ipynb: a jupyter notebook for exploring the data.

root folder

  • requirements.txt: a text file lsiting the projet's dependancies.
  • setup.py: a configuration file for installing the scripts as a package.
  • README.md: Markdown text with a brief explanation of the project and the repository structure.
  • Dockerfile: build users can create an automated build that executes several command-line instructions in a container.
  • docker-compose.yaml: Integrates the various docker containers and run them in a single environment.

Installation guide

git clone https://github.com/isaaclucky/data-warehousing.git
cd data-warehousing
sudo python3 setup.py install

Tech Stack

Tech Stack used in this project

Getting Started

Prerequisites

Make sure you have docker installed on local machine.

  • Docker
  • DockerCompose

Installation

  1. Clone the repo
    git clone https://github.com/isaaclucky/data-warehousing.git
  2. Run
     docker-compose build
     docker-compose up
  3. Open Airflow web browser
    Navigate to `http://localhost:8000/` on the browser
    activate and trigger dbt_dag
    activate and trigger migrate
  4. Access the DBT models and docks
     dbt docs serve --port 8081
     Navigate to `http://localhost:8081/` on the browser
  5. Access redash dashboard
    docker-compose up 
    Navigate to `http://localhost:3500/` on the browser
  6. Access your PostgreSQL database using adminar
    Navigate to `http://localhost:8080/` on the browser
    choose PostgreSQL databse
    use `airflow` for username
    use `airflow` for password

License

Distributed under the MIT License. See LICENSE for more information.

Contact

Yishak Tadele - @email Contact Me - @contact

Acknowledgements