/test

Primary LanguagePython

Table of Contents
  1. Getting Started
  2. Usage
  3. Usage

Getting Started

Currently, the setup can only be done in a single machine.

Prerequisites

You will need a Google Cloud Platform account in order to setup BigQuery and the credentials in order for it to work

You will need python libraries and dependency manager such as pip. I use poetry

  • Poetry

    $ curl -sSL https://raw.githubusercontent.com/python-poetry/poetry/master/get-poetry.py | python -

Installation

  1. Clone the repo

    $ git clone https://github.com/koksang/test.git
  2. To install dependencies, you can

    • poetry
      $ poetry install
      $ poetry shell

(back to top)

Usage

  1. Setup environment

    • Use .env
      $ touch .env
      $ echo "AIRFLOW_HOME=${YOUR PATH TO AIRFLOW_HOME}" >> .env
      $ echo "DBT_PROJECT_DIR=${YOUR DBT_PROJECT_DIR}" >> .env
      $ echo "DBT_PROFILES_DIR=${YOUR DBT_PROFILES_DIR}" >> .env
      $ echo "PROJECT_ID=${YOUR GCP PROJECT_ID}" >> .env
      $ echo "BUCKET=${YOUR GCS BUCKET}" >> .env
      $ set -a && source .env
  2. Start a local airflow cluster

    • Create airflow database

      $ airflow db init
      $ airflow users create -u admin -r Admin --password admin -e admin -f admin -l admin
    • Start airflow webserver

      $ airflow webserver
    • Start airflow scheduler

      $ airflow scheduler
    • Go to airflow and login using username: admin and password: admin

  3. To exit poetry shell, you can

    • poetry
      $ exit

(back to top)

Output

  1. Airflow Dag dbt_lineage

  2. DBT Lineage dbt_lineage

  3. Visualization Dashboard insights_report

(back to top)

Assumptions

  1. Any taxi driver can only have maximum 1 shift a day

  2. A shift is defined by the earliest trip_start_timestamp and latest trip_end_timestamp in a single day

(back to top)