Simple example of ETL (Extract, Transform and Load) using Spark, SparkSQL and PySpark.
Update brew first, then install Scala and Spark.
brew upgrade && brew update
brew install scala
brew install apache-spark
brew install python3
pip3 install pyspark
You need to define environment variables and declare paths so that the Spark driver is accessible through pySpark.
vim .bashrc
Insert these environment variables into the file you are editing and save it.
export SPARK_HOME=/usr/local/Cellar/apache-spark/3.0.1/libexec
export PATH=/usr/local/Cellar/apache-spark/3.0.1/bin:$PATH
Execute the following command in your terminal
python3 sales_etl.py