The aim of this Project is to create a scalable data warehouse that will host the vehicle trajectory data extracted by analysing footage taken by swarm drones and static roadside cameras. The data warehouse will by implementing ELT data engineering methods with PostgreSQL, DBT, redash and Airflow
The data source is the PNEUMA dataset. The data extracted from the data source is in CSV data files sorted in a chronological order, depending on the day it was captured, the region the car was driving on and the time the records were taken. In the CSV file, the data is separated by semi colons. There are 4 columns that are uniformly defining the data. The Next six columns are repeated for every time a record an image was taken by the drones. In each record, the last of the six columns is the time the record was taken.
This repository contains the code used to build an ELT data Pipeline in accordance with the Techstack flow show above.
Docker - https://docs.docker.com/engine/install/
-
requirements - REquirements.txt
-
docker compose file - https://github.com/Stella-Mutacho/Data-warehouse-tech-stack-with-MySQL-DBT-Airflow/blob/main/airflow-docker/docker-compose.yml
-
Airflow DAGs -https://github.com/Stella-Mutacho/Data-warehouse-tech-stack-with-MySQL-DBT-Airflow/tree/main/airflow-docker/dags
-
extract Script- https://github.com/Stella-Mutacho/Data-warehouse-tech-stack-with-MySQL-DBT-Airflow/blob/main/scripts/extract.py
Prerequisites
Make sure you have docker installed on local machine.
Docker
DockerCompose
Installation
1. Clone the repo
git clone https://github.com/Stella-Mutacho/Data-warehouse-tech-stack-with-MySQL-DBT-Airflow
2. Datawarehouse
cd sensor_data
Run
3. docker-compose up
To access and Modify the default configrations for each tool use the .env files.
Navigate to `http://localhost:8080/` on the browser
use `mysqldb` server
use `airflow` database
use `airflow` for username
use `airflow` for password
Navigate to `http://localhost:8080/` on the browser
use `postgres-dbt` server
use `airflow` database
use `airflow` for username
use `airflow` for password
Airflow is used for scheduling and automation.
Navigate to http://localhost:8080/
on the browser
use airflow
for username
use airflow
for password
DBT performs the T in the ELT process, transforming the data in the warehouses.
Airflow is used for automation of running and testing dbt models
navigate to https://sensordataelt.herokuapp.com/index.html to access dbt docs
open terminal and execute docker-compose run — rm server create_db
using adminer create a user and grant read access
Navigate to http://localhost:5000/
on the browser
navigate to localhost:8088 to access Airflow
Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.
Fork the Project
Create your Branch (git checkout -b feature/NewFeature)
Commit your Changes (git commit -m 'Add some NewFeature')
Push to the Branch (git push origin feature/NewFeature)
Open a Pull Request