/PySpark-Basics

This repository contains basic knowledge of PySpark such as regular RDDs, pair RDDs, some basic transformations and actions.

Primary LanguageJupyter Notebook

PySpark-Basics

The first part in this repository contains basic knowledge of PySpark such as regular RDDs, pair RDDs, some basic transformations and actions. The second part in this repository contains advanced RDD actions, PySpark DataFrame, SparkSession, PySpark SQL and Data Visualization. The third part contains MLlib in PySpark, which contains three key areas such as Collaborative Filtering, Classification and Clustering.