Vagrant project to spin up a single virtual machine running:
- Hadoop 2.7.5
- Hive 2.3.2
- Spark 2.2.1
The virtual machine will be running the following services:
- HDFS NameNode + NameNode
- YARN ResourceManager + JobHistoryServer + ProxyServer
- Hive metastore and server2
- Spark history server
- Download and install VirtualBox 5.1.x
- Download and install Vagrant
- Download and install Git for Windows (Windows only)
- Open Terminal or Git Bash
- Run
vagrant box add centos/7
- Go to releases and download and extract the latest source of this project.
- Run
chmod -R 755 vagrant-hadoop-spark-hive
where "vagrant-hadoop-spark-hive" is the name of the project folder (Linux and Mac only) - In your terminal change your directory into the project directory (i.e.
cd vagrant-hadoop-spark-hive
). - Run
vagrant up
to create the VM. - Use
vagrant ssh
to login to the VM. - When inside the shell, use
sudo -s
to switch to root, which is how you should access Hadoop, Spark and Hive services.
Here are some useful links to navigate to various UI's:
- YARN resource manager: (http://10.211.55.101:8088)
- Job history: (http://10.211.55.101:19888/jobhistory/)
- HDFS: (http://10.211.55.101:50070/dfshealth.html)
- Spark history server: (http://10.211.55.101:18080)
- Spark context UI (if a Spark context is running): (http://10.211.55.101:4040)
To test out the virtual machine setup, and for examples of how to run MapReduce, Hive and Spark, head on over to VALIDATING.md.
Upon system restart, all services should begin accordingly.
If you'd like to learn more about working and optimizing Vagrant then take a look at ADVANCED.md.
The file DEVELOP.md contains some tips for developers.
- Consistently keep project up to date with modern versions of services such as Hadoop, Spark and Hive
- Consider other useful services related to Hadoop, Spark and Hive to be added
- Work towards provisioning multiple boxes to work together over a local network
- Make .sh scripts work to provision environment on a regular centos machine such as a server
This project is based on the great work carried out at (https://github.com/vangj/vagrant-hadoop-2.4.1-spark-1.0.1).