DAT500-19-sample

UiS DAT500 PySpark sample code.

Since the cluster is being deployed, if your cluster is not ready yet, you can practice in your local environment with following this guide.

1. Getting Started

Access your Jupyter notebook. (e.g. https://group8-jp.wiktorskit.sigma2.no)

Installation

cd <YOUR-GROUP-FOLDER>
git clone https://github.com/thejungwon/dat500-19-sample.git
cd dat500-19-sample
wget -O actors.list https://www.dropbox.com/s/vofyl0uryectfyt/actors.list?dl=0
//or 
curl -L -o actors.list https://www.dropbox.com/s/vofyl0uryectfyt/actors.list\?dl\=1

(Option #1) Running with regular Python

ABSOLUTE_PATH for each group

Group 1,3,5,7,9,11,13,15 : /mnt/wiktorskit-jungwonseo-ns0000k/home/notebook/YOUR_GROUP_FOLDER
Group 2,4,6,8,10,12,14 : /mnt/wiktorskit-danielb-ns0000k/home/notebook/YOUR_GROUP_FOLDER

python word_cnt.py <ABSOLUTE_PATH>/dat500-19-sample/actors.list <ABSOLUTE_PATH>/dat500-19-sample/output01

(Option #2) Running with spark-submit command

spark-submit --verbose word_cnt.py <ABSOLUTE_PATH>/dat500-19-sample/actors.list <ABSOLUTE_PATH>/dat500-19-sample/output01

You may want to set more configuration. (Try this configuration later when you need performance analysis.)

spark-submit --verbose \
--executor-memory 1g \
--driver-memory 1g \
--conf spark.driver.memoryOverhead=1024m \
--conf spark.executor.memoryOverhead=1024m \
word_cnt.py <ABSOLUTE_PATH>/dat500-19-sample/actors.list <ABSOLUTE_PATH>/dat500-19-sample/output01

(Option #3) Running with Jupyter Notebook

word_cnt.ipynb

2. Available Cluster for Each Group

Updated: Tue Mar 26 11:12:14 CET 2019

Group	G1	G2	G3	G4	G5	G6	G7	G8	G9	G10	G11	G12	G13	G14	G15
Status	O	O	O	O	O	O	O	O	O	O	O	O	O	O	O
Jupyter	O	O	O	O	O	O	O	O	O	O	O	O	O	O	O
Spark	O	O	O	O	O	O	O	O	O	O	O	O	O	O	O

3. Built With

Python3

4. Tips and Tricks

(context menu) shift + mouse right click
(paste) shift + insert
(copy) ctrl + insert