Everything to deploy a Hadoop cluster on Ubuntu.
This guide works for current Hadoop versions upto 2.7.2
. Configs for future versions might differ.
sudo apt-get update
echo yes | sudo apt-get install default-jdk
- Find download mirror on http://www.apache.org/dyn/closer.cgi/hadoop/common/
- Fetch and untar the binary package
- Add env variables in
.bashrc
(see bash-env)
cd $HADOOP_HOME/etc/hadoop
- Change
core-site.xml
,yarn-site.xml
,hdfs-site.xml
,hadoop-env.sh
(see configs/)
$HADOOP_PREFIX/bin/hdfs namenode -format
$HADOOP_PREFIX/sbin/hadoop-daemon.sh start namenode
(on master)$HADOOP_PREFIX/sbin/hadoop-daemon.sh start datanode
(on workers)
$HADOOP_PREFIX/sbin/yarn-daemon.sh start resourcemanager
(on master)$HADOOP_PREFIX/sbin/yarn-daemon.sh start nodemanager
(on workers)
- https://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-common/ClusterSetup.html
- http://www.alexjf.net/blog/distributed-systems/hadoop-yarn-installation-definitive-guide/
- HDFS/YARN process startup failures? Logs are your best friend!
$HADOOP_HOME/logs/
- https://wiki.apache.org/hadoop/TroubleShooting
- To set up a remote cluster on a public cloud, configure ACLs to allow TCP traffic for the ports used by Hadoop. E.g. use an NSG for Azure.
- https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsMultihoming.html