/pyspark-etl-template

ETL job template to extract, transform, load data to hdfs using PySpark

Primary LanguagePython

pyspark-etl-template

ETL job template to extract, transform, load data to hdfs using PySpark. File used from kaggle: https://www.kaggle.com/karangadiya/fifa19/data#

Getting started

In order to run the project you need to have Spark 2.4.5 and Hadoop 3.1.0 installed on your machine. HDFS should be configured with all HADOOP and SPARK environment variables correctly set.

Testing

Unit tests reside in the tests folder.