- This repository is for Hadoop Enthusiasts, which covers all of the Hadoop Ecosystem, problems faced while installing Hadoop in pseudo mode as well as multi node cluster.
- This repository covers pseudo mode cluster installation, cdh5 multi node cluster installation, hdfs commands, benchmarking, hive, sqoop, impala, flume and spark.
- All modules should have proper documentation.
- CDH5 Installation
- Cloudera Distributed Cluster Installation
- Screenshots for each step in CDH5 Installation
- Commands
- hdfs File System Commands
- Hadoop Commands
- Flume
- fetch twitter data using flume
- Flume Documentation
- Hibench
- Benchmarking
- Hive
- DA440_LabFiles
- Hive tasks
- hive documentation
- Pseudo_Cluster
- Cluster installation steps
- Sqoop
- Sqoop Tasks
- Spark
- cca175
- DataFrame operations
- File Formats
- Final_Tasks
- Hive_Tables
- Sample Problems
- Spark Document
- Spark Practice RDD
- Spark SQL Practice
- Spark Submit
- Spark Wordcount
- Tasks on RDD and DataFrames
- Task on Titanic Dataset
- Fix Under Replicated Blocks
- Change Ownership
- CCA version 2