Hadoop Cluster Configurations
This is intended to help Hadoop Users, specifically users with System Administration background to setup Hadoop quickly and efficiently.
The config files are from running cluster. Feel free to use them, but please drop an email with your feedback.
I have uploaded a 64-bit version of the latest stable release hadoop-2.9.2 to google drive.
For any help you can reach me at: trainings@netxillon.com
I provide Advanced Hadoop Administration, DevOps, HBase, Kafka and other traings.
Advanced Hadoop Training: I will be covering topics like: detailed kerberos, Encryption, Centerlized caching, Storage policy, Ranger, Knox, Hadoop Performance Tuning and Production Use cases. Contact me for details.
"Doing a course is not a guarantee for a job, but having a solid foundation surely is"
This is Hadoop Administration course, for which you see all the configs in this github.
Demo: https://www.youtube.com/channel/UC6vfYICj0azZkuc5sVw71PA
Duration: 24 hours
Module 1: Hadoop High Availability for HDFS and Resource Manager.
− Using both JQM and Shared storage.
- Zookeeper Details.
Module 2: Hadoop Queuing and pools details.
− Fair and Capacity Scheduler details. − Dynamic pool configuration. − User management and LDAP integration.
- Dynamic shares and scheduling policies.
Module 3: HDFS Advanced Features
− Hadoop Centralised Caching. − Hadoop Storage Policy and Archive Storage. − Hadoop memory as storage tier. − HDFS Extended Attributes. − HDFS Short circuit Read. − Quotas per storage type. − Snapshots and HDFS over NFS.
- Yarn Labels
Module 4: In-depth Performance tuning and Cluster Sizing. − JVM tuning for Hadoop. − HDFS and MapReduce Tuning. − Network tuning.
- YARN Performance tuning and details on parameters.
Module 5: Hadoop Security. − Hadoop Knox or any other security tool. − Detailed kerberos setup for securing Hadoop. − Hadoop Encryption at rest.
Module 6: Hadoop Upgrade and Production use cases. − Hadoop Rolling upgrade. − Phoenix details and setup. − HDFS Configuration for multihoming. − Namenode Recovery scenarios − Common production Issues.
Module 7: HBASE and Hive. − Hbase Administration and troubleshooting. − Hive and Hbase recovery and upgrades.
- HBase and Hive production use cases and common issues.
Duration: 24 hours
Module 1: Using Hadoop as an warehouse. − Data policies.
- Various ingestion and extraction methods.
- Archiving policies
Module 2: Flume Configuration. − Flume Installation and Configuration. − Flume channels and various formats. − Flume twitter use case.
Module 3: Data Ingestion using Sqoop and Hive − Sqoop details.
- MySql imports and exports
- Tuning Sqoop
- Hive details and intergation with Sqoop
- Hbase integration
Module 4: Spark installation and Configuration. − Spark Architecture
- Spark standalone mode setup. − Spark in YARN mode. − Spark use cases and programs.
Module 5: Data Pipleline − Understand Kafka architecture and configuration. − Building a Kafka Data pipeline. − Example and common issues.
- Integrating Spark with Kafka.
Module 6: Storm Architecture. − Storm Cluster Setup. − Storm Use Cases. − Adding Storm to the Data Pipeline. − Storm performance tuning.
Module 7: Project.
This is a advanced course and is expected that he user has a good hold on Hadoop platform with HDFS, OS knowledge.
- HBase Architecture Details
- HBase TroubleShooting
- HBase use cases.
- HBase Row key Design
- HBase coprocessors
- HBase replica
- HBase Kerberos Setup