A project for migrating a data-warehouse tech stack built with:
- PostgreSQL
- Airflow
- dbt
- Redash
The changes performed are swapping PostgreSQL for MySQL and Redash for Apache Superset
The data is initially a video feed of drones tracking different vehicles on the road. Then this was turned into a trajectory describing format. In our data the vehicles are described with 4 columns, and the trajectories are described with 6 repeating columns that change with approximately 4 second time interval.
For each .csv file the following apply:
- each row represents the data of a single vehicle
- the first 10 columns in the 1st row include the columns’ names (track_id; type; traveled_d; avg_speed; lat; lon; speed; lon_acc; lat_acc; time)
- the first 4 columns include information about the trajectory like the unique trackID, the type of vehicle, the distance traveled in meters and the average speed of the vehicle in km/h
- the last 6 columns are then repeated every 6 columns based on the time frequency. For example, column_5 contains the latitude of the vehicle at time column_10, and column_11 contains the latitude of the vehicle at time column_16.
- Speed is in km/h, Longitudinal and Lateral Acceleration in m/sec2 and time in seconds.
Make sure you have docker installed on local machine.
- Docker: Install guide
- Docker-Compose: Install guide
-
Clone the repo
git clone https://github.com/Hen0k/data-warehouse-migration-superset.git
-
Run
docker-compose build docker-compose up
-
Open Airflow web browser
Navigate to http://localhost:8081/ on the browser activate and trigger all the dags
-
Access dbt
Navigate to http://localhost:8080/ on the browser
- dummy task
-
Data source: pNEUMA – open-traffic.epfl.ch
-
Airflow-with-docker-setup: Medium blog by Marvin Lanhenke
-
Airflow-DockerOperator-with-dockercompose: Medium blog by Flávio Clésio
Distributed under the MIT License. See LICENSE for more information.
Made with contrib.rocks.