pyspark-etl-template

ETL job template to extract, transform, load data to hdfs using PySpark. File used from kaggle: https://www.kaggle.com/karangadiya/fifa19/data#

Getting started

In order to run the project you need to have Spark 2.4.5 and Hadoop 3.1.0 installed on your machine. HDFS should be configured with all HADOOP and SPARK environment variables correctly set.

Testing

Unit tests reside in the tests folder.

vas1l3v/pyspark-etl-template

pyspark-etl-template

Getting started

Testing