hadoop

  • 修改本機hostname:

    vi /etc/hostname
  • 建立hadoop使用者以及群組(每台都要)

    • 建立群組
    sudo addgroup hadoop_group  
    • 建立 Hadoop 專用帳戶
    sudo adduser --ingroup hadoop_group [帳號]
    • 將 hadoop_admin 帳戶加入 sudo 權限
      sudo vi /etc/sudoers
      下方新增:
      hadoop_admin ALL=(ALL:ALL) ALL
    • 切換至剛剛建立好的 hadoop 管理帳號
      su hadoop_admin
  • 建立金鑰:

    ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
    cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys   
    • scp authorized_keys到各個slave內
    • 如果沒有.ssh資料夾,就先ssh localhost 再登出就會出現了
  • 安裝java

    • 下載jdk1.8
    • 解壓縮
    tar zxvf jdk-8u251-linux-x64.tar.gz
    • 把資料夾移動到home目錄
    mv jdk1.8.0_251 ~/
    • 設定 ~/.bashrc
      #找到這幾行
      
      # enable programmable completion features (you don't need to enable
      # this, if it's already enabled in /etc/bash.bashrc and /etc/profile
      # sources /etc/bash.bashrc).
      if ! shopt -oq posix; then
        if [ -f /usr/share/bash-completion/bash_completion ]; then
          . /usr/share/bash-completion/bash_completion
        elif [ -f /etc/bash_completion ]; then
          . /etc/bash_completion
        fi
      fi
      
      
      export JAVA_HOME=/home/{username}/jdk1.8.0_251   #加上這兩行 (注意路徑)
      export PATH=$JAVA_HOME/bin:$PATH            #加上這兩行
    • 執行~/.bashrc檔 source ~/.bashrc
    • 測試
    java -version
  • 安裝hadoop

    • 解壓縮
    tar zxvf hadoop-2.10.0.tar.gz
    • 移動到 home目錄
    mv hadoop-2.10.0 ~/
    • 修改 core-site.xml
    cd ~
    cd hadoop-2.10.0/etc/hadoop/
    vim core-site.xml
    • 找到下面:
    <!-- Put site-specific property overrides in this file. -->
    
    <configuration>
    </configuration>           
    • 覆蓋
      fs.defaultFS (設定HDFS登入位置,port號為9000)
      hadoop.tmp.dir (設定保存臨時文件位置,預設是/tmp/hadoop-hadoop)
      ex:

      <configuration>
           <property>
                <name>hadoop.tmp.dir</name>
                <value>/home/hadoop/tmp</value>
                <description>Abase for other temporary directories.</description>
           </property>
           <property>
                <name>fs.defaultFS</name>
                <value>hdfs://{主機}:9000</value>  #注意路徑
           </property>
      </configuration>
    • 修改 hdfs-site.xml

      cd ~/hadoop-2.10.0/etc/hadoop/
      vim hdfs-site.xml
    • 一樣修改
      dfs.namenode.secondary.http-address 主要是設定Name Node欲保存的Metadata之儲存目錄
      dfs.datanode.data.dir Data Node欲保存的資料之儲存目錄
      dfs.replication 儲存資料之副本數
      ex:

      <configuration>
            <property>
                    <name>dfs.namenode.secondary.http-address</name>  #可有可無
                    <value>master:50090</value>
            </property>
            <property>
                    <name>dfs.namenode.name.dir</name>
                    <value>/home/hadoop/hdfs/namenode</value>   #注意路徑
            </property>
            <property>
                    <name>dfs.datanode.data.dir</name>
                    <value>/home/hadoop/hdfs/datanode</value>
            </property>
            <property>
                    <name>dfs.replication</name>
                    <value>3</value>
            </property>
      </configuration>
    • 修改slave 檔,加上所有slave的名稱 :

      master
      slave1
      slave2
    • 並於/etc/hosts 修改各機器ip及主機名稱 (每台都要)

    127.0.0.1       localhost
    192.xxx.x.xxx   master
    192.xxx.x.xxx   slave1
    192.xxx.x.xxx   slave2
    • 修改 hadoop-env.sh 檔,在底下添加:
      export JAVA_HOME=/home/spark/jdk1.8.0_251  #注意路徑
      export PATH=$JAVA_HOME/bin:$PATH
      
      export HADOOP_HOME=/home/spark/hadoop-2.10.0
      export PATH=$HADOOP_HOME/bin:$PATH
      
      export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop  
    • 修改~/.bashrc檔,底下添加:(記得source)
      export HADOOP_HOME=/home/spark/hadoop-2.10.0   #注意路徑
      export PATH=$HADOOP_HOME/bin:$PATH
      
      export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
    • 建立需要的資料夾(hadoop/tmp 還有hdfs/namenode、datanode)
    cd ~
    mkdir tmp
    mkdir hdfs
    
    cd hdfs
    mkdir namenode
    mkdir datanode
    • scp 整個hadoop資料夾、jave資料夾、~/.bashrc 檔到各個slave:(記得每台都要source)
    scp -r hadoop-2.10.0 xxx@xxx.xxx.x.xxx:~
    • 格式化hadoop:
    hadoop namenode -format
    • 啟動hadoop:
    cd ~
    cd hadoop-2.10.0/sbin
    ./start-dfs.sh
  • 安裝yarn:(可有可無)

    • 修改 yarn-site.xml
    <configuration>
      <property>
          <name>yarn.nodemanager.aux-services</name>
          <value>mapreduce_shuffle</value>
      </property>
    </configuration>
    • 修改 mapred-site.xml.template
    <configuration>
      <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
      </property>
    </configuration>

#參考資料:https://medium.com/@sleo1104/hadoop-3-2-0-%E5%AE%89%E8%A3%9D%E6%95%99%E5%AD%B8%E8%88%87%E4%BB%8B%E7%B4%B9-22aa183be33a