Hadoop

Hadoop-Enthusiasts

This repository is for Hadoop Enthusiasts, which covers all of the Hadoop Ecosystem, problems faced while installing Hadoop in pseudo mode as well as multi node cluster.
This repository covers pseudo mode cluster installation, cdh5 multi node cluster installation, hdfs commands, benchmarking, hive, sqoop, impala, flume and spark.
All modules should have proper documentation.

CDH5 Installation
- Cloudera Distributed Cluster Installation
- Screenshots for each step in CDH5 Installation
Commands
- hdfs File System Commands
- Hadoop Commands
Flume
- fetch twitter data using flume
- Flume Documentation
Hibench
- Benchmarking
Hive
- DA440_LabFiles
- Hive tasks
- hive documentation
Pseudo_Cluster
- Cluster installation steps
Sqoop
- Sqoop Tasks
Spark
- cca175
- DataFrame operations
- File Formats
- Final_Tasks
- Hive_Tables
- Sample Problems
- Spark Document
- Spark Practice RDD
- Spark SQL Practice
- Spark Submit
- Spark Wordcount
- Tasks on RDD and DataFrames
- Task on Titanic Dataset
Fix Under Replicated Blocks
Change Ownership
CCA version 2