data-warehouse-migration-superset

ContributorsForksStargazersIssuesMIT LicenseLinkedIn

A project for migrating a data-warehouse tech stack built with:

  • PostgreSQL
  • Airflow
  • dbt
  • Redash

The changes performed are swapping PostgreSQL for MySQL and Redash for Apache Superset

About the data

The data is initially a video feed of drones tracking different vehicles on the road. Then this was turned into a trajectory describing format. In our data the vehicles are described with 4 columns, and the trajectories are described with 6 repeating columns that change with approximately 4 second time interval.

For each .csv file the following apply:

  • each row represents the data of a single vehicle
  • the first 10 columns in the 1st row include the columns’ names (track_id; type; traveled_d; avg_speed; lat; lon; speed; lon_acc; lat_acc; time)
  • the first 4 columns include information about the trajectory like the unique trackID, the type of vehicle, the distance traveled in meters and the average speed of the vehicle in km/h
  • the last 6 columns are then repeated every 6 columns based on the time frequency. For example, column_5 contains the latitude of the vehicle at time column_10, and column­­­_11 contains the latitude of the vehicle at time column_16.
  • Speed is in km/h, Longitudinal and Lateral Acceleration in m/sec2 and time in seconds.

Prerequisites

Make sure you have docker installed on local machine.

Installation

  1. Clone the repo

    git clone https://github.com/Hen0k/data-warehouse-migration-superset.git
  2. Run

     docker-compose build
     docker-compose up
  3. Open Airflow web browser

    Navigate to http://localhost:8081/ on the browser activate and trigger all the dags

  4. Access dbt

    Navigate to http://localhost:8080/ on the browser

Roadmap

  • dummy task

Acknowledgement

License

Distributed under the MIT License. See LICENSE for more information.

Contributors

Contributors list Henok Tilaye

Made with contrib.rocks.