Simple exercises to improve my knowledge of PySpark. Exercises include:
- Classifying emails as spam or ham
- Find out how many clusters are in a dataset
- PySpark equivalent of pandas
Practising PySpark by solving exercises such as email classification, clustering data and pandas equivalent to pySpark.
Jupyter Notebook