/hadoop-admin-workshop

everything related to the hadoop admin workshop

Hadoop Admin Workshop

Agenda

  1. 08:30: Prep + Food Ordering
  2. 09:00: Introduction and Expectations
  3. 09:30: Environment Setup:
  4. 10:00 Introduction into Hadoop + Installation
  5. 12:30: Lunch break
  6. 13:15: Exploring HDFS! (create user, find blocks,...)
  7. 13:45: Distributed Systems: Concepts and Components
  8. 14:00: Benchmarking + Service Checks
  9. 14:30: Secure Cluster
  10. 16:30: Outlook. What to do next?
  11. 17:00: End + Let's have a drink!

Prep + Food Ordering

  • Join Vienna Admin WhatsApp Group
  • Go to mjam.at/blabla and send an email

Introduction and Expectations

  • Interactive Workshop
  • Gather experiences of participants
  • Gather expectations of participants on whiteboard (?)
  • Community Event: Help me to help you; help each other!
  • Take pictures and share them!

Environment Setup

  • Did everyone, who wants to participate in the Hadoop Admin Vienna WhatsApp Group give me their number?
  • Is the local setup sufficient for the training?
    • SSH client + key in place?
    • SFTP client + key in place?
  • Did everyone receive the private key per Email?
  • Everyone got assigned a set of 4 AWS machines?

Introduction into Hadoop: Distributed Storage and Processing

Exploring HDFS

  • Hands On: Create a user directory (Linux user + HDFS directory structure)
  • Hands On: Load any file into HDFS
  • Hands On: Copy Data via Distcp

Concepts and Components

  • Discuss the services and components!
  • Get familiar with the concepts: High Availabilty + Scalability (Master/Slave Architecture)
  • Check out block of file in HDFS
  • Hands On: Adding a second node to the cluster.
  • Hands On: Make the installed cluster services highly available.

Troubleshooting and Service Checks

  • Hands On: Run service checks, check health of components
  • Hands On: Check log files
  • Hands On: Logsearch ?

Benchmarking

  • Discussion: Why Benchmarking?
  • Hands On: Terasgen - Generate Random Data
  • Hands On: Terasort - Sort Random Data

Resource Management

  • Hands On: YARN applications
  • Hands On: YARN Queues

Secure Cluster

  • Install Apache Ranger
  • Setup KDC
  • Kerberize Cluster
  • Setup and configure Apache Knox

Outlook