This project aims to deploy a datalake and its ecosystem using Ansible and/or Vagrant.
Acutally, this project does :
Deploy a full Data Lake using virtual machines running on your computer :
* A HDFS cluster : 1 namenode and 2 datanodes
* A metadata system manager : 1 GeoNetwork
- Ansible
- Install ansible on your own computer (for Ubuntu or Debian) :
apt-get install ansible
- If you connect to your remote machine using password instead of ssh-key (as recommanded), you have to install this apt :
apt-get install sshpass
- Vagrant with virtualox
- Install virtualbox on your own computer (for Ubuntu or Debian):
apt-get install virtualbox
- Install vagrant on your own computer (for Ubuntu or Debian):
apt-get install vagrant
Ansible is an open-source software provisioning, configuration management, and application-deployment tool.[2] It runs on many Unix-like systems, and can configure both Unix-like systems as well as Microsoft Windows. It includes its own declarative language to describe system configuration.
Ansible is a good tool to deploy and maintain IT systems. Based on Yaml configuration files, ansible makes it easy to describe your configuration and share it with your collaborators. Then you can deploy it to your infrastructure, you only need to have a ssh access to your servers.
Ansible playbook is a list of system instructions which has to be send to a machine. That's why you only need 2 things :
- Get ansible installed on your own computer
- Have a remote machine (physical, vmware, virtualbox, docker, lxc, ...) with a ssh server running
Vagrant is an open-source software product for building and maintaining portable virtual software development environments,[5] e.g. for VirtualBox, KVM, Hyper-V, Docker containers, VMware, and AWS. It tries to simplify the software configuration management of virtualizations in order to increase development productivity. Vagrant is written in the Ruby language, but its ecosystem supports development in a few languages.
Vagrant manages your virtual machine (VM) on command line. The benefits are :
- Quickly create VM with a know & controlled environment
- Restore your VM to a known state
- Destribute yours VM easly
Vagrant and ansible can be combined to create/deploy/maintain your VM as we do in this project
You only need virtual box and vagrant installed on your computer. This project is going to create VM that you need for your datalake
- Set your nodes' IP address in VagrantFile. Inside this file, edit your network setting (name for your interface adaptator and DNS option)
- Declare those IP for ansible provision in vars
- Configure your own computer to access to your nodes using their hostname (need for access to hadoop web ui)
vim /etc/hosts
- in cli : start your multiple VM from this directory : vagrant/cluster :
vagrant up
- Format HDFS :
- ssh on namenode
- in cli : as user hadoop : change directory & format HDFS
sudo su hadoop cd /usr/local/hadoop/bin/ hdfs namenode -format
- Start HDFS deamon on your cluser
- ssh on namenode
- in cli : as root : start service hadoop
sudo systemctl start hadoop
- WORK In Progress : systemd will tell you something wrong happens but cluster is working anyway.
- Verify your cluster is up:
- on your own device, use a webbrowser
- go on [IP-of-your-namenode]:9870 if default : http://10.0.0.10:9870
work in progress
Aidmoit's Collect is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses