/Home_Sales

Used the knowledge of SparkSQL to determine key metrics about home sales data and used Spark to create temporary views, partition the data, cache and uncache a temporary table, and verify that the table has been uncached.

Primary LanguageJupyter Notebook

1

Home_Sales

Used the knowledge of SparkSQL to determine key metrics about home sales data and used Spark to create temporary views, partition the data, cache and uncache a temporary table, and verify that the table has been uncached.

Spark SQL Home Sales Analysis

This repository contains code that performs analysis on a dataset of home sales using Apache Spark SQL.

The code performs various Spark SQL operations on a dataset of home sales. It includes the following functionality:

  • Reading a CSV file from an AWS S3 bucket into a DataFrame.
  • Creating temporary views of the DataFrame.
  • Executing SQL queries to analyze the dataset.
  • Caching and uncaching tables for performance optimization.
  • Working with parquet formatted data.