/cloud-computing

Web repository for the "Cloud Programming Models" module

Primary LanguageJupyter Notebook

Cloud Programming

Lectures

Date Time Room Topics Slide
10/03 8:30-10:30 Online Python 3 crash course. Slides
Notes
Notebooks
17/03 15:30-18:30 Online Python 3 crash course. Introduction to the course. Overview of parallel architectures and relative programming paradigms. Slides
Slides
24/03 15:30-18:30 Online Large-scale programming issues. Functional programming concepts. MapReduce programming model. Examples with pseudocode: word count and image tiling. Slides
31/03 15:30-18:30 Online Partitioners and reducers. Introduction to Hadoop. Hadoop download and setup. Notes
Notebooks
07/04 15:30-18:30 Online Hadoop setup and configuration. Programming in Hadoop (Java). Wordcount exercise. Slides
HDFS
Java
21/04 15:30-18:30 Online Hadoop Distributed File System (HDFS). Hadoop runtime framework for MapReduce (YARN). Moving Average exercise. Slides
28/04 15:30-18:30 Online Fault tolerance in Hadoop MapReduce. MapReduce Design Patterns: Intermediate data reduction, Matrix generation and multiplication, Selection and filtering, Joining, Graph algorithms. Matrix Multiplication exercise. Slides
05/05 8:30-10:30 Online MapReduce graph algorithms. Spark introduction, download and setup. Slides
Notes
12/05 15:30-18:30 Online Spark: resilient distributed datasets, lineage. Spark architecture: driver, executors, context. Spark actions and transformations. Pi and Wordcount exercises. Slides
Notes
19/05 8:30-10:30 Online RDD persistence, broadcast variables and accumulator. Anatomy of a Spark job, wide and narrow transformations. Slides
22/05 8:30-11:30 Online Application development in Spark: communications cost, shuffling & sorting. MLlib. Logistic regression and PageRank exercises. Slides

Messages

Node Manager Issues
If you do not see all your node managers list in the Web UI (or by running the command yarn node -list -all on the hadoop-namenode machine) please update the yarn-site.xml configuration files on all your machines with the following new property:

<property>
  <name>yarn.resourcemanager.hostname</name>
  <value>hadoop-namenode</value>
</property>

Note that on all your machines you must assign the same value, i.e., the hostname of the virtual machines hosting the YARN resource manager. This will allow the node managers to correctly communicate with the resource manager.