/olist-pyspark-elt-demo

An simple project using Pyspark to transform data.

Primary LanguageJupyter Notebook

olist-pyspark-elt-demo

A simple project using Pyspark to transform data.

This project uses Pyspark and Delta Tables to load csv files and transform them into delta table data lake.

I used Olist dataset. You must download it and extract the csv files into the /data/stage folder.

Use Poetry to configure the project.

poetry lock
poetry install

In order to run the pipeline the project has a Papermill workflow.

Goes to:

cd src/utils

Run the code below in order to execute all three layers. You can run any layer add or removing parameters: brz, slv, gld.

python orchestration.py brz slv gld