/Spark-exercise-problems

Course labs from Berkeley course on Spark, written in jupyter notebooks

Primary LanguageJupyter Notebook

Spark-exercise-problems

Exercise problems from cs110 The notebooks can be run from databricks hosted spark instance at http://community.cloud.databricks.com The data files are also available by following the comments in each notebook. Most of the datafiles are being hosted internally on Databrick's s3 instance and they can only be access going through their notebook.

Caution, this code was written in Python 2.6 and Spark 1.6, some of the code needs to be changed if running under Spark 2.0 as rdd to DF conversions need to be handled differently