Hadoop Example Program Demo

Program Dependencies

When compile & run this program, this is dependenciees I've used:

Hadoop 3.3.0
Hadoop 3.3.0 Fixes, I've included this in hadoop-3.3.0-configs.zip in this project
JDK 8 (Java Development Kit 8)
Windows 10 as operating system

Practice recommended

I should say that this is my first hadoop try, and I found that this program a bit buggy for me, and need a loot of manual configuration, so this is my advices for you,

Run start-dfs and start-yarn as ADMIN, when not as ADMIN I found the HDFS an dyarn service couldn't run properly.

Algortihm Included

Word Count, counting every word (separated by whitespace) occurences in a text file.
Average, counting average transaction value from every ID (each ID could have multiple transactions).
Top Ten, searching top ten transaction with most value.

Word Count

Compile

Navigate to WordCount folder for make it easier

Compile the java file or WordCount.java file and link it with needed hadoop library in %HADOOP_HOME%\share\hadoop, for example with this code

javac -classpath "D:\ProgramData\Hadoop\hadoop-3.3.0\share\hadoop\common\*";"D:\ProgramData\Hadoop\hadoop-3.3.0\share\hadoop\mapreduce\*" -d WordCount/ WordCount.java

Create jar file from the classes, for example with this code
```
jar -cvf WordCount.jar -C WordCount/ .
```

As an example, this is compilation step in my test

Run

Make sure that HDFS and yarn service already started, by running these command (my advice run in the terminal as ADMINISTRATOR)
```
start-dfs
start-yarn
```
I've already included %HADOOP%_HOME\sbin in the PATH, so that should be work.
Write the text file that want to be counted, for example wordcount.txt that I want to create here
Create a directory in hadoop as the input directory, for example here /input9 by running this command
```
hadoop fs -mkdir /input9
```
Place your text file into hadoop directory that already created (should be empty), for example by this code
```
hadoop fs -put wordcount.txt /input9
```
make sure it already created by ls command in hadoop, for example by this code
```
hadoop fs -ls /input9
```
it would show something like this,
Run hadoop .jar program by input directory is the directory that has just been created, and output directory is a new or non-existent directory, for example by this code
```
hadoop jar WordCount.jar WordCount /input9 /output9
```
just wait first and would show something like this,
After finish, explore the output directory and try to cat file there, and you should find the result there, as in this example

Average

Compile

Navigate to Average folder for make it easier

Compile the java file or Average.java file and link it with needed hadoop library in %HADOOP_HOME%\share\hadoop, for example with this code

javac -classpath "D:\ProgramData\Hadoop\hadoop-3.3.0\share\hadoop\common\*";"D:\ProgramData\Hadoop\hadoop-3.3.0\share\hadoop\mapreduce\*" -d Average/ Average.java

Create jar file from the classes, for example with this code
```
jar -cvf Average.jar -C Average/ .
```

As an example, this is compilation step in my test

Run

Make sure that HDFS and yarn service already started, by running these command (my advice run in the terminal as ADMINISTRATOR)
```
start-dfs
start-yarn
```
I've already included %HADOOP%_HOME\sbin in the PATH, so that should be work.
Write the text file that want to be counted, for example average.txt that I want to create here
Create a directory in hadoop as the input directory, for example here /input11 by running this command
```
hadoop fs -mkdir /input11
```
Place your text file into hadoop directory that already created (should be empty), for example by this code
```
hadoop fs -put average.txt /input11
```
make sure it already created by ls command in hadoop, for example by this code
```
hadoop fs -ls /input11
```
it would show something like this,
Run hadoop .jar program by input directory is the directory that has just been created, and output directory is a new or non-existent directory, for example by this code
```
hadoop jar Average.jar Average /input11 /output11
```
just wait first and would show something like this,
After finish, explore the output directory and try to cat file there, and you should find the result there, as in this example

Top Ten

Compile

Navigate to TopTen folder for make it easier

Compile the java file or TopTen.java file and link it with needed hadoop library in %HADOOP_HOME%\share\hadoop, for example with this code

javac -classpath "D:\ProgramData\Hadoop\hadoop-3.3.0\share\hadoop\common\*";"D:\ProgramData\Hadoop\hadoop-3.3.0\share\hadoop\mapreduce\*" -d TopTen/ TopTen.java

Create jar file from the classes, for example with this code
```
jar -cvf TopTen.jar -C TopTen/ .
```

As an example, this is compilation step in my test

Run

Make sure that HDFS and yarn service already started, by running these command (my advice run in the terminal as ADMINISTRATOR)
```
start-dfs
start-yarn
```
I've already included %HADOOP%_HOME\sbin in the PATH, so that should be work.
Write the text file that want to be counted, for example topten.txt that I want to create here
Create a directory in hadoop as the input directory, for example here /input10 by running this command
```
hadoop fs -mkdir /input10
```
Place your text file into hadoop directory that already created (should be empty), for example by this code
```
hadoop fs -put topten.txt /input10
```
make sure it already created by ls command in hadoop, for example by this code
```
hadoop fs -ls /input10
```
it would show something like this,
Run hadoop .jar program by input directory is the directory that has just been created, and output directory is a new or non-existent directory, for example by this code
```
hadoop jar TopTen.jar TopTen /input10 /output10
```
just wait first and would show something like this,
After finish, explore the output directory and try to cat file there, and you should find the result there, as in this example

primayoriko/hadoop-mapreduce-demo

Hadoop Example Program Demo

Program Dependencies

Practice recommended

Algortihm Included

Word Count

Compile

Run

Average

Compile

Run

Top Ten

Compile

Run