- Install Spark on Google Colab and load datasets in PySpark
- Change column datatype, remove whitespaces and drop duplicates
- Remove columns with Null values higher than a threshold
- Group, aggregate and create pivot tables
- Rename categories and impute missing numeric values
- Create visualizations to gather insights