/dat500-19-sample

UiS DAT500 sample code

Primary LanguageJupyter Notebook

DAT500-19-sample

UiS DAT500 PySpark sample code.

Since the cluster is being deployed, if your cluster is not ready yet, you can practice in your local environment with following this guide.

1. Getting Started

Access your Jupyter notebook. (e.g. https://group8-jp.wiktorskit.sigma2.no) main_image

Installation

cd <YOUR-GROUP-FOLDER>
git clone https://github.com/thejungwon/dat500-19-sample.git
cd dat500-19-sample
wget -O actors.list https://www.dropbox.com/s/vofyl0uryectfyt/actors.list?dl=0
//or 
curl -L -o actors.list https://www.dropbox.com/s/vofyl0uryectfyt/actors.list\?dl\=1

(Option #1) Running with regular Python

ABSOLUTE_PATH for each group

  • Group 1,3,5,7,9,11,13,15 : /mnt/wiktorskit-jungwonseo-ns0000k/home/notebook/YOUR_GROUP_FOLDER
  • Group 2,4,6,8,10,12,14 : /mnt/wiktorskit-danielb-ns0000k/home/notebook/YOUR_GROUP_FOLDER
python word_cnt.py <ABSOLUTE_PATH>/dat500-19-sample/actors.list <ABSOLUTE_PATH>/dat500-19-sample/output01

(Option #2) Running with spark-submit command

spark-submit --verbose word_cnt.py <ABSOLUTE_PATH>/dat500-19-sample/actors.list <ABSOLUTE_PATH>/dat500-19-sample/output01

You may want to set more configuration. (Try this configuration later when you need performance analysis.)

spark-submit --verbose \
--executor-memory 1g \
--driver-memory 1g \
--conf spark.driver.memoryOverhead=1024m \
--conf spark.executor.memoryOverhead=1024m \
word_cnt.py <ABSOLUTE_PATH>/dat500-19-sample/actors.list <ABSOLUTE_PATH>/dat500-19-sample/output01

(Option #3) Running with Jupyter Notebook

2. Available Cluster for Each Group

Updated: Tue Mar 26 11:12:14 CET 2019

Group G1 G2 G3 G4 G5 G6 G7 G8 G9 G10 G11 G12 G13 G14 G15
Status O O O O O O O O O O O O O O O
Jupyter O O O O O O O O O O O O O O O
Spark O O O O O O O O O O O O O O O

3. Built With

  • Python3

4. Tips and Tricks

  • (context menu) shift + mouse right click
  • (paste) shift + insert
  • (copy) ctrl + insert