Used the knowledge of SparkSQL to determine key metrics about home sales data and used Spark to create temporary views, partition the data, cache and uncache a temporary table, and verify that the table has been uncached.
This repository contains code that performs analysis on a dataset of home sales using Apache Spark SQL.
The code performs various Spark SQL operations on a dataset of home sales. It includes the following functionality:
- Reading a CSV file from an AWS S3 bucket into a DataFrame.
- Creating temporary views of the DataFrame.
- Executing SQL queries to analyze the dataset.
- Caching and uncaching tables for performance optimization.
- Working with parquet formatted data.