This repository contains solutions for four Spark exercises.
- SparkSQL
- Spark RDD
- Spark DataFrame and Machine Learning Pipelines -- Gradient Boosted Tree
- Spark Application -- Crime Analysis
- Spark Application -- Profit Prediction
├── README.md <- You are here
├── SparkSQL
│ ├── exercise1.py <- python source code file
│ ├── exercise1.png <- Output of the Spark Job
│ ├── exercise1-findings.txt <- Findings
│ ├── Problem_Statement.md <- Problem Statement
├── SparkRDD
│ ├── exercise2.py <- python source code file
│ ├── exercise2.txt <- Output of the Spark Job
│ ├── exercise2-findings.txt <- Findings
│ ├── Problem_Statement.md <- Problem Statement
├── Spark_Machine_Learning_Pipeline
│ ├── exercise3.py <- python source code file
│ ├── exercise3.txt <- Output of the Spark Job: Out of sample R Square of the Model
│ ├── Problem_Statement.md <- Problem Statement
├── Spark_Application_Crime_Analysis
│ ├── exercise4.py <- python source code file
│ ├── exercise4.txt <- Output of the Spark Job
│ ├── exercise4.png <- Output of the Spark Job
│ ├── exercise3-findings.txt <- Findings
│ ├── Problem_Statement.md <- Problem Statement
├── Spark_Application_Profit_Prediction
│ ├── exercise5.py <- python source code file
│ ├── mape_all.txt <- Output of the Spark Job
│ ├── Problem_Statement.md <- Problem Statement
<!-- tocstop -->