/spark-exercises

Some exercises to learn Spark. Solved in Python.

Primary LanguagePythonMIT LicenseMIT

PySpark - Exercises

This is a collection of exercises for Spark solved in Python (PySpark).

Clone this repository in your local space, then install a virtualenv for your libraries

  • Install virtualenv using pip > pip install virtualenv
  • Create a new virtual environment in this repo > virtualenv env

Install the dependencies by prompting

pip install -r requirements.txt

References

  1. Apache Spark official website: https://spark.apache.org/
  2. Exercises source: https://dbdmg.polito.it/wordpress/teaching/big-data-architectures-and-data-analytics-2019-2020