A guide to walk through the Hadoop installation process on windows 7 (for Windows 10 see How to install Hadoop in 5 Steps in Windows 10). This tutorial will show how to install Hadoop on Windows 7, breaking down the installation into clear steps.
- I- Download Files
- II- Setup Folders and Files
- III- Setup Environment Variables
- IV- Setup Configuration Files
- V- Testing
- VI- Some Hadoop Commands
- VII- Stop Hadoop
- Download Hadoop hadoop-2.9.2.tar.gz
- Download java 11
- Download winutils, we will need the bin files that are in hadoop-2.9.2/bin
- Create new folder in
C:\
namedhadoop
We will extract hadoop-2.9.2.tar.gz
to this folder C:\hadoop\
.
- Created 3 folders
- First folder named
data
, should be created inC:\hadoop\hadoop-2.9.2\
. LikeC:\hadoop\hadoop-2.9.2\data
.
- Second folder named
datanode
, should be created inC:\hadoop\hadoop-2.9.2\data\
. LikeC:\hadoop\hadoop-2.9.2\data\datanode
. - Third folder named
namenode
, should be created also inC:\hadoop\hadoop-2.9.2\data\
. LikeC:\hadoop\hadoop-2.9.2\data\namenode
.
- Extract the
winutils-master.zip
file
Enter to winutils-master
:
We will using Hadoop 2.9.2 :
so we will copy all files that are in the bin folder of hadoop-2.9.2 folder winutils-master\hadoop-2.9.2\bin\
:
to C:\hadoop\hadoop-2.9.2\bin
. Replacing all files :
- Setup Java 11
To avoid errors, create a folder in
C:\
namedjava
, then extract thejdk-11.0.19_windows-x64_bin.zip
file toC:\java
, (like we did in step II-1-) :
Here is the output of the extreaction :
click on windows key
then search for environment variables
, then click on edit environment variables for your account
- JAVA_HOME :
If you don't have a
JAVA_HOME
variable, click onnew
to add it :
- Then in the
variable name
type :
JAVA_HOME
- And in the
variable value
type :
C:\java\jdk-11.0.19
- HADOOP_HOME :
- Click on
new
to addHADOOP_HOME
variable, then in thevariable name
type :
HADOOP_HOME
- And in the
variable value
type :
C:\hadoop\hadoop-2.9.2
- Path :
Scroll to
Path
and select it, then click onEdit
, and add this to the begining ofvariable value
type :
%JAVA_HOME%\bin;%HADOOP_HOME%\bin;%HADOOP_HOME%\sbin;
- Verifying :
- Verifying
Java
: Open cmd and type :
java -version
The output should be like :
java version "11.0.19" 2023-04-18 LTS
Java(TM) SE Runtime Environment 18.9 (build 11.0.19+9-LTS-224)
Java HotSpot(TM) 64-Bit Server VM 18.9 (build 11.0.19+9-LTS-224, mixed mode)
- Verifying
JAVA_HOME
: In cmd type :
echo %JAVA_HOME%
The output should be like :
C:\java\jdk-11.0.19
- Verifying
HADOOP_HOME
: In cmd type :
echo %HADOOP_HOME%
The output should be like :
C:\hadoop\hadoop-2.9.2
- Verifying
PATH
: In cmd type :
echo %PATH%
The output should be like :
C:\java\jdk-11.0.19\bin;C:\hadoop\hadoop-2.9.2\bin;C:\hadoop\hadoop-2.9.2\sbin;C:\Use...
Go to C:\hadoop\hadoop-2.9.2\etc\hadoop
to find the file that we will edit.
- Modifying the
core-site.xml
file :
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>C:\hadoop\hadoop-2.9.2\tmp</value>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
- Modifying the
hdfs-site.xml
file :
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>C:\hadoop\hadoop-2.9.2\data\namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>C:\hadoop\hadoop-2.9.2\data\datanode</value>
</property>
</configuration>
- Modifying the
mapred-site.xml
file :
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
- Modifying the
yarn-site.xml
file :
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>
- Verifying the
hadoop-env.cmd
file :
- Make sure that JAVA_HOME is set correctly :
set JAVA_HOME=%JAVA_HOME%
- Verifying hadoop : Use this command in the cmd :
hadoop version
- Formatting Namenode : Only for the first time to start hadoop use :
hdfs namenode -format
- Start hadoop : First run :
start-dfs
and two windows will popup :
Then run :
start-yarn
and two windows will popup :
- jps : To verify that all works, run :
jps
The output is like :
6896 NodeManager
6388 ResourceManager
8488 DataNode
2028 NameNode
9788 Jps
- Verify namenode : In a browser, open this link :
localhost:50070
- Verify Resourcemanger : In a browser, open this link :
http://localhost:8088
- Create a folder in hdoop hdfs: let's start with creating a folder :
hadoop fs -mkdir /aissam_data
and show it :
hadoop fs -ls /
To see it in the browser, open this link :
localhost:50070
When we click on aissam_data
we see that it is empty :
- copy data example to it :
I have a file in my computer names
myData.txt
, that have this text inside it:
windows data Big
windows 7 installation BigData
10 guid Big Data
TESTY TEST
Windows
So, let't cpoy it to hadoop:
hdfs dfs -put C:\hadoop\myData.txt /aissam_data/data_.txt
To list it :
hdfs dfs -ls /
hdfs dfs -ls /aissam_data
To see its size :
hadoop fs -du /aissam_data/data_.txt
To see its contenue :
hadoop fs -cat /aissam_data/data_.txt
- for more command, see /2_hadoop_commands\hadoop_commands.txt
To stop run :
stop-dfs
Then run :
stop-yarn