This is a collection of exercises for Spark solved in Python (PySpark).
- Install
virtualenv
using pip >pip install virtualenv
- Create a new virtual environment in this repo >
virtualenv env
pip install -r requirements.txt
- Apache Spark official website: https://spark.apache.org/
- Exercises source: https://dbdmg.polito.it/wordpress/teaching/big-data-architectures-and-data-analytics-2019-2020