/Spark-Demo

Files used for the spark demo on my website. PySpark and Sparklyr

Primary LanguageJupyter Notebook

Spark Demo

These files were part of a simple demo I created to illustrate how to run PySpark and Sparklyr on AWS connecting to Hive tables.

These analysis were done on the HR data set on Kaggle (https://www.kaggle.com/ludobenistant/hr-analytics/data), there is a lot more machine learning stuff that could be demonstrated, the focus of the demo was on spinning up the EMR cluster, importing data into Hive, and connecting Hive to PySpark and SparkR.

Find the full analysis on my website at https://www.kyle-stahl-mn.com/using-spark-on-aws