ETL job template to extract, transform, load data to hdfs using PySpark. File used from kaggle: https://www.kaggle.com/karangadiya/fifa19/data#
In order to run the project you need to have Spark 2.4.5 and Hadoop 3.1.0 installed on your machine. HDFS should be configured with all HADOOP and SPARK environment variables correctly set.
Unit tests reside in the tests folder.