Tips and Tricks
This repo contains a random collection of Spark code, written mostly in python (using the PySpark API). I have also included code/scripts in Scala and SparkR. Feel free to copy and use as-in. Let me know if you have any questions or feedback regarding any of the code.
Zeppelin Notebook Hub (can be used to view Zeppelin notebooks, in json format):
Spark Tuning & Best Practices Reference:
Spark Tuning Tool:
Machine Learning Cheatsheets:
• SKLearn - Choosing the right estimator
• Keras Cheatsheet
• SAS - ML Algorithms
• MS Azure - ML Algorithms
• Kaggle ML Solutions
• Apache Spark Quickstart
• Spark PySpark (Python) API
• Databricks - Guide
• Databricks - Developer Resources
• Spark Tuning Guide
• Spark Tuning - Garbage Collection
• Hortonworks - Spark Reference
• Anaconda Hortonworks Management Packs
• Apache Spark - Best Practices & Tuning
• PySpark Cheatsheet