gb_de_home_assignment

Prerequisites

Install PySpark on Windows

https://sparkbyexamples.com/pyspark/how-to-install-and-run-pyspark-on-windows/

Install the needed packages in your env

  1. conda activate <your_python_env>
  2. pip install -r requirements.txt

Download data files and extract them to project ./data folder

Running the program

  1. conda activate <your_python_env>
  2. python main.py (from project root folder)