olist-pyspark-elt-demo

A simple project using Pyspark to transform data.

This project uses Pyspark and Delta Tables to load csv files and transform them into delta table data lake.

I used Olist dataset. You must download it and extract the csv files into the /data/stage folder.

Use Poetry to configure the project.

poetry lock

poetry install

In order to run the pipeline the project has a Papermill workflow.

Goes to:

cd src/utils

Run the code below in order to execute all three layers. You can run any layer add or removing parameters: brz, slv, gld.

python orchestration.py brz slv gld

fsimoes81/olist-pyspark-elt-demo