When compile & run this program, this is dependenciees I've used:
- Hadoop 3.3.0
- Hadoop 3.3.0 Fixes, I've included this in
hadoop-3.3.0-configs.zip
in this project - JDK 8 (Java Development Kit 8)
- Windows 10 as operating system
I should say that this is my first hadoop try, and I found that this program a bit buggy for me, and need a loot of manual configuration, so this is my advices for you,
- Run
start-dfs
andstart-yarn
as ADMIN, when not as ADMIN I found theHDFS
an dyarn
service couldn't run properly.
- Word Count, counting every word (separated by whitespace) occurences in a text file.
- Average, counting average transaction value from every ID (each ID could have multiple transactions).
- Top Ten, searching top ten transaction with most value.
-
Navigate to
WordCount
folder for make it easier -
Compile the java file or
WordCount.java
file and link it with needed hadoop library in%HADOOP_HOME%\share\hadoop
, for example with this codejavac -classpath "D:\ProgramData\Hadoop\hadoop-3.3.0\share\hadoop\common\*";"D:\ProgramData\Hadoop\hadoop-3.3.0\share\hadoop\mapreduce\*" -d WordCount/ WordCount.java
-
Create jar file from the classes, for example with this code
jar -cvf WordCount.jar -C WordCount/ .
As an example, this is compilation step in my test
-
Make sure that
HDFS
andyarn
service already started, by running these command (my advice run in the terminal as ADMINISTRATOR)start-dfs start-yarn
I've already included
%HADOOP%_HOME\sbin
in the PATH, so that should be work. -
Write the text file that want to be counted, for example
wordcount.txt
that I want to create here -
Create a directory in hadoop as the input directory, for example here
/input9
by running this commandhadoop fs -mkdir /input9
-
Place your text file into hadoop directory that already created (should be empty), for example by this code
hadoop fs -put wordcount.txt /input9
make sure it already created by
ls
command in hadoop, for example by this codehadoop fs -ls /input9
it would show something like this,
-
Run hadoop
.jar
program by input directory is the directory that has just been created, and output directory is a new or non-existent directory, for example by this codehadoop jar WordCount.jar WordCount /input9 /output9
just wait first and would show something like this,
-
After finish, explore the output directory and try to
cat
file there, and you should find the result there, as in this example
-
Navigate to
Average
folder for make it easier -
Compile the java file or
Average.java
file and link it with needed hadoop library in%HADOOP_HOME%\share\hadoop
, for example with this codejavac -classpath "D:\ProgramData\Hadoop\hadoop-3.3.0\share\hadoop\common\*";"D:\ProgramData\Hadoop\hadoop-3.3.0\share\hadoop\mapreduce\*" -d Average/ Average.java
-
Create jar file from the classes, for example with this code
jar -cvf Average.jar -C Average/ .
As an example, this is compilation step in my test
-
Make sure that
HDFS
andyarn
service already started, by running these command (my advice run in the terminal as ADMINISTRATOR)start-dfs start-yarn
I've already included
%HADOOP%_HOME\sbin
in the PATH, so that should be work. -
Write the text file that want to be counted, for example
average.txt
that I want to create here -
Create a directory in hadoop as the input directory, for example here
/input11
by running this commandhadoop fs -mkdir /input11
-
Place your text file into hadoop directory that already created (should be empty), for example by this code
hadoop fs -put average.txt /input11
make sure it already created by
ls
command in hadoop, for example by this codehadoop fs -ls /input11
it would show something like this,
-
Run hadoop
.jar
program by input directory is the directory that has just been created, and output directory is a new or non-existent directory, for example by this codehadoop jar Average.jar Average /input11 /output11
just wait first and would show something like this,
-
After finish, explore the output directory and try to
cat
file there, and you should find the result there, as in this example
-
Navigate to
TopTen
folder for make it easier -
Compile the java file or
TopTen.java
file and link it with needed hadoop library in%HADOOP_HOME%\share\hadoop
, for example with this codejavac -classpath "D:\ProgramData\Hadoop\hadoop-3.3.0\share\hadoop\common\*";"D:\ProgramData\Hadoop\hadoop-3.3.0\share\hadoop\mapreduce\*" -d TopTen/ TopTen.java
-
Create jar file from the classes, for example with this code
jar -cvf TopTen.jar -C TopTen/ .
As an example, this is compilation step in my test
-
Make sure that
HDFS
andyarn
service already started, by running these command (my advice run in the terminal as ADMINISTRATOR)start-dfs start-yarn
I've already included
%HADOOP%_HOME\sbin
in the PATH, so that should be work. -
Write the text file that want to be counted, for example
topten.txt
that I want to create here -
Create a directory in hadoop as the input directory, for example here
/input10
by running this commandhadoop fs -mkdir /input10
-
Place your text file into hadoop directory that already created (should be empty), for example by this code
hadoop fs -put topten.txt /input10
make sure it already created by
ls
command in hadoop, for example by this codehadoop fs -ls /input10
it would show something like this,
-
Run hadoop
.jar
program by input directory is the directory that has just been created, and output directory is a new or non-existent directory, for example by this codehadoop jar TopTen.jar TopTen /input10 /output10
just wait first and would show something like this,
-
After finish, explore the output directory and try to
cat
file there, and you should find the result there, as in this example