/data-warehouse-dbt-airflow-postgress

A data-warehouse built for the pNEUMA open dataset of naturalistic trajectories of half a million vehicles collected by a swarm of drones in a congested downtown area of Athens, Greece.

Primary LanguagePythonMIT LicenseMIT

data-warehouse-dbt-airflow-postgress-redash

ContributorsForksStargazersIssuesMIT LicenseLinkedIn

A data-warehouse built for the pNEUMA open dataset of naturalistic trajectories of half a million vehicles collected by a swarm of drones in a congested downtown area of Athens, Greece.

About the data

The data is initially a video feed of drones tracking different vehicles on the road. Then this was turned into a trajectory describing format. In our data the vehicles are described with 4 columns, and the trajectories are described with 6 repeating columns that change with approximately 4 second time interval.

For each .csv file the following apply:

  • each row represents the data of a single vehicle
  • the first 10 columns in the 1st row include the columns’ names (track_id; type; traveled_d; avg_speed; lat; lon; speed; lon_acc; lat_acc; time)
  • the first 4 columns include information about the trajectory like the unique trackID, the type of vehicle, the distance traveled in meters and the average speed of the vehicle in km/h
  • the last 6 columns are then repeated every 6 columns based on the time frequency. For example, column_5 contains the latitude of the vehicle at time column_10, and column­­­_11 contains the latitude of the vehicle at time column_16.
  • Speed is in km/h, Longitudinal and Lateral Acceleration in m/sec2 and time in seconds.

Installation

Roadmap

  • Create the data extraction and loading module
  • Containerize the module
  • Run Airflow in a container
  • Modify the compose file to use multiple database users
  • Create an Airflow DAG with a DockerOperator
  • Test that the workflow actually populates the containerized database
  • Locally install dbt
  • Connect dbt to the db and run models
  • Install and connect Redash
  • Create sample visualization
  • Containerized Redash with the rest
  • Run dbt models as Airflow DAGs
  • Generate dbt docs with Airflow
  • Containerized dbt with the rest
  • Create a dev, staging, and production area for the database.

Acknowledgement

Contributors

Contributors list Henok Tilaye

Made with contrib.rocks.