/PySpark

PySpark (Apache Spark) Code, Tricks, and References

Primary LanguagePython

Apache Spark (PySpark) Scripts and References


This repo contains Spark code, written in python (using the PySpark API). Feel free to copy and use as-in. Let me know if you have any questions or feedback regarding any of the code.

Zeppelin Notebook Hub (can be used to view Zeppelin notebooks, in json format): https://www.zeppelinhub.com/viewer/

References:
Apache Spark Quickstart
Spark PySpark (Python) API
Databricks - Guide
Databricks - Developer Resources
Spark Tuning Guide
Spark Tuning - Garbage Collection
Hortonworks - Spark Reference