/Scalable-Cloud-Programming-SPARK

Scalable Data Analysis of BlueGene/L Supercomputer Logs Using Apache Spark

Primary LanguageJupyter Notebook

Scalable-Cloud-Programming-SPARK

Step 1: Open Your terminal and navigate to project root directory (SCP_Project-x21171203). You should change to this directory before starting PySpark.

cd ~/SCP_Project-x21171203

Step 2: Start 'PySpark' server and open Jupyter notebook on browser from the 'URL:port' that will be printed as a output by executing below command.

pyspark

Step 3: When Jupyter notebook starts you will be able to see 5 .ipynb file that are as mentioned below. Click on any of the below mentioned file to open it in new tab.

Question_2.ipynb Question_6.ipynb Question_12.ipynb Question_16.ipynb Question_17.ipynb

Step 4: Click on 'cell' in menu bar and select 'Run All' option and this will Run all the cells in that .ipynb file to give the analysis results and graphs.

--- D0 'step 3' and 'step 4' to run all the the questions ---

Diagram Name