Install Hadoop in Windows 7

A guide to walk through the Hadoop installation process on windows 7 (for Windows 10 see How to install Hadoop in 5 Steps in Windows 10). This tutorial will show how to install Hadoop on Windows 7, breaking down the installation into clear steps.

Table of Contents

I- Download Files

hadoop-2.9.2

java11

winutils

II- Setup Folders and Files

  1. Create new folder in C:\ named hadoop

hadoop_folders_0

We will extract hadoop-2.9.2.tar.gz to this folder C:\hadoop\.

extract_hadoop_1

extract_hadoop_2

  1. Created 3 folders
  • First folder named data, should be created in C:\hadoop\hadoop-2.9.2\. Like C:\hadoop\hadoop-2.9.2\data.

hadoop_folders_1

  • Second folder named datanode, should be created in C:\hadoop\hadoop-2.9.2\data\. Like C:\hadoop\hadoop-2.9.2\data\datanode.
  • Third folder named namenode, should be created also in C:\hadoop\hadoop-2.9.2\data\. Like C:\hadoop\hadoop-2.9.2\data\namenode.

hadoop_folders_2

  1. Extract the winutils-master.zip file

extract_winutils_1

Enter to winutils-master :

extract_winutils_2

We will using Hadoop 2.9.2 :

extract_winutils_3

so we will copy all files that are in the bin folder of hadoop-2.9.2 folder winutils-master\hadoop-2.9.2\bin\ :

copy_bin_from

to C:\hadoop\hadoop-2.9.2\bin. Replacing all files :

copy_bin_to

  1. Setup Java 11 To avoid errors, create a folder in C:\ named java, then extract the jdk-11.0.19_windows-x64_bin.zip file to C:\java, (like we did in step II-1-) :

extract_java_11

Here is the output of the extreaction :

java11_in_java_folder

III- Setup Environment Variables

click on windows key then search for environment variables, then click on edit environment variables for your account

win_key

  1. JAVA_HOME : If you don't have a JAVA_HOME variable, click on new to add it :

click_new_envirenment_variable

  • Then in the variable name type :
JAVA_HOME
  • And in the variable value type :
C:\java\jdk-11.0.19

JAVA_HOME_envirenment_variable

  1. HADOOP_HOME :
  • Click on new to add HADOOP_HOME variable, then in the variable name type :
HADOOP_HOME
  • And in the variable value type :
C:\hadoop\hadoop-2.9.2

HADOOP_HOME_envirenment_variable

  1. Path : Scroll to Path and select it, then click on Edit, and add this to the begining of variable value type :
%JAVA_HOME%\bin;%HADOOP_HOME%\bin;%HADOOP_HOME%\sbin;

Path_envirenment_variable

  1. Verifying :
  • Verifying Java : Open cmd and type :
java -version

The output should be like :

java version "11.0.19" 2023-04-18 LTS
Java(TM) SE Runtime Environment 18.9 (build 11.0.19+9-LTS-224)
Java HotSpot(TM) 64-Bit Server VM 18.9 (build 11.0.19+9-LTS-224, mixed mode)

verifying_1_java-version

  • Verifying JAVA_HOME : In cmd type :
echo %JAVA_HOME%

The output should be like :

C:\java\jdk-11.0.19

verifying_2_JAVA_HOME

  • Verifying HADOOP_HOME : In cmd type :
echo %HADOOP_HOME%

The output should be like :

C:\hadoop\hadoop-2.9.2

verifying_3_HADOOP_HOME

  • Verifying PATH : In cmd type :
echo %PATH%

The output should be like :

C:\java\jdk-11.0.19\bin;C:\hadoop\hadoop-2.9.2\bin;C:\hadoop\hadoop-2.9.2\sbin;C:\Use...

verifying_4_PATH

IV- Setup Configuration Files

Go to C:\hadoop\hadoop-2.9.2\etc\hadoop to find the file that we will edit.

configuration_files

  1. Modifying the core-site.xml file :
<configuration>
    <property>		
        <name>hadoop.tmp.dir</name>
        <value>C:\hadoop\hadoop-2.9.2\tmp</value>
    </property>

    <property>
        <name>fs.default.name</name>
        <value>hdfs://localhost:9000</value>
    </property>
</configuration>
  1. Modifying the hdfs-site.xml file :
<configuration>
  <property>
    <name>dfs.replication</name>
    <value>1</value>
  </property>

  <property>
    <name>dfs.namenode.name.dir</name>
    <value>C:\hadoop\hadoop-2.9.2\data\namenode</value>
  </property>

  <property>
    <name>dfs.datanode.data.dir</name>
    <value>C:\hadoop\hadoop-2.9.2\data\datanode</value>
  </property>
</configuration>
  1. Modifying the mapred-site.xml file :
<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
</configuration>
  1. Modifying the yarn-site.xml file :
<configuration>
  <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
  </property>

  <property>
    <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
    <value>org.apache.hadoop.mapred.ShuffleHandler</value>
  </property>
</configuration>
  1. Verifying the hadoop-env.cmd file :
  • Make sure that JAVA_HOME is set correctly :
    set JAVA_HOME=%JAVA_HOME%

V- Testing

  1. Verifying hadoop : Use this command in the cmd :
hadoop version

hadoop_version

  1. Formatting Namenode : Only for the first time to start hadoop use :
hdfs namenode -format
  1. Start hadoop : First run :
start-dfs

and two windows will popup :

start_hadoop_1

Then run :

start-yarn

and two windows will popup :

start_hadoop_2

  1. jps : To verify that all works, run :
jps

The output is like :

6896 NodeManager
6388 ResourceManager
8488 DataNode
2028 NameNode
9788 Jps

jps

  1. Verify namenode : In a browser, open this link :
localhost:50070

namenode_browser

  1. Verify Resourcemanger : In a browser, open this link :
http://localhost:8088

Resourcemanger_browser

VI- Some Hadoop Commands

  1. Create a folder in hdoop hdfs: let's start with creating a folder :
hadoop fs -mkdir /aissam_data

and show it :

hadoop fs -ls /

mkdir_ls

To see it in the browser, open this link :

localhost:50070

namenode_browser_1

namenode_browser_2

When we click on aissam_data we see that it is empty :

namenode_browser_3

  1. copy data example to it : I have a file in my computer names myData.txt, that have this text inside it:
windows data Big
windows 7 installation BigData
10 guid Big Data
TESTY TEST
Windows

So, let't cpoy it to hadoop:

hdfs dfs -put C:\hadoop\myData.txt /aissam_data/data_.txt

data_to_hadoop_1

To list it :

hdfs dfs -ls /
hdfs dfs -ls /aissam_data

data_to_hadoop_2

To see its size :

hadoop fs -du /aissam_data/data_.txt

data_to_hadoop_3

To see its contenue :

hadoop fs -cat /aissam_data/data_.txt

data_to_hadoop_4

  • for more command, see /2_hadoop_commands\hadoop_commands.txt

VII- Stop Hadoop

To stop run :

stop-dfs

stop_hadoop_1

Then run :

stop-yarn

stop_hadoop_2