Code used to generate my most recent Medium article https://medium.com/@crocker456/using-docker-and-pyspark-134cd4cab867.
This repository contains a notebook where I walk through several pyspark and spark SQL concepts. This notebook is a bit messy because I've been adding examples.
The csv file used can be found here: https://data.vermont.gov/Finance/Vermont-Vendor-Payments/786x-sbp3
The csv is not included because it is quite large.