Playing With Pyspark

Code used to generate my most recent Medium article https://medium.com/@crocker456/using-docker-and-pyspark-134cd4cab867.

This repository contains a notebook where I walk through several pyspark and spark SQL concepts. This notebook is a bit messy because I've been adding examples.

The csv file used can be found here: https://data.vermont.gov/Finance/Vermont-Vendor-Payments/786x-sbp3

The csv is not included because it is quite large.

chrisqiqiu/PlayingWithPyspark

Playing With Pyspark