Date | Time | Room | Topics | Slide |
---|---|---|---|---|
10/03 | 8:30-10:30 | Online | Python 3 crash course. | Slides Notes Notebooks |
17/03 | 15:30-18:30 | Online | Python 3 crash course. Introduction to the course. Overview of parallel architectures and relative programming paradigms. | Slides Slides |
24/03 | 15:30-18:30 | Online | Large-scale programming issues. Functional programming concepts. MapReduce programming model. Examples with pseudocode: word count and image tiling. | Slides |
31/03 | 15:30-18:30 | Online | Partitioners and reducers. Introduction to Hadoop. Hadoop download and setup. | Notes Notebooks |
07/04 | 15:30-18:30 | Online | Hadoop setup and configuration. Programming in Hadoop (Java). Wordcount exercise. | Slides HDFS Java |
21/04 | 15:30-18:30 | Online | Hadoop Distributed File System (HDFS). Hadoop runtime framework for MapReduce (YARN). Moving Average exercise. | Slides |
28/04 | 15:30-18:30 | Online | Fault tolerance in Hadoop MapReduce. MapReduce Design Patterns: Intermediate data reduction, Matrix generation and multiplication, Selection and filtering, Joining, Graph algorithms. Matrix Multiplication exercise. | Slides |
05/05 | 8:30-10:30 | Online | MapReduce graph algorithms. Spark introduction, download and setup. | Slides Notes |
12/05 | 15:30-18:30 | Online | Spark: resilient distributed datasets, lineage. Spark architecture: driver, executors, context. Spark actions and transformations. Pi and Wordcount exercises. | Slides Notes |
19/05 | 8:30-10:30 | Online | RDD persistence, broadcast variables and accumulator. Anatomy of a Spark job, wide and narrow transformations. | Slides |
22/05 | 8:30-11:30 | Online | Application development in Spark: communications cost, shuffling & sorting. MLlib. Logistic regression and PageRank exercises. | Slides |
Node Manager Issues
If you do not see all your node managers list in the Web UI (or by running the commandyarn node -list -all
on thehadoop-namenode
machine) please update theyarn-site.xml
configuration files on all your machines with the following new property:
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop-namenode</value>
</property>
Note that on all your machines you must assign the same value, i.e., the hostname of the virtual machines hosting the YARN resource manager. This will allow the node managers to correctly communicate with the resource manager.