This is the companion repository for the Data Analysis with Python and PySpark book (Manning, estimated publishing date: 2022.) It contains the source code and data download scripts, when pertinent.
The complete data set for the book hovers at around ~1GB. Because of this, I
moved the data sources to Drobpox to
avoid cloning a gigantic repository. The book assumes the data is under
./data
.
If you encounter mistakes in the book manuscript (including the printed source code), please use the Manning platform to provide feedback.