/Hadoop

Apache Hadoop

Primary LanguagePython

Hadoop

Hadoop-Enthusiasts

Summary

  • This repository is for Hadoop Enthusiasts, which covers all of the Hadoop Ecosystem, problems faced while installing Hadoop in pseudo mode as well as multi node cluster.
  • This repository covers pseudo mode cluster installation, cdh5 multi node cluster installation, hdfs commands, benchmarking, hive, sqoop, impala, flume and spark.
  • All modules should have proper documentation.

Table of contents

  • CDH5 Installation
    • Cloudera Distributed Cluster Installation
    • Screenshots for each step in CDH5 Installation
  • Commands
    • hdfs File System Commands
    • Hadoop Commands
  • Flume
    • fetch twitter data using flume
    • Flume Documentation
  • Hibench
    • Benchmarking
  • Hive
    • DA440_LabFiles
    • Hive tasks
    • hive documentation
  • Pseudo_Cluster
    • Cluster installation steps
  • Sqoop
    • Sqoop Tasks
  • Spark
    • cca175
    • DataFrame operations
    • File Formats
    • Final_Tasks
    • Hive_Tables
    • Sample Problems
    • Spark Document
    • Spark Practice RDD
    • Spark SQL Practice
    • Spark Submit
    • Spark Wordcount
    • Tasks on RDD and DataFrames
    • Task on Titanic Dataset
  • Fix Under Replicated Blocks
  • Change Ownership
  • CCA version 2