petro-rudenko/bigdata-toolbox

Big data toolbox (Hadoop, Storm, Spark etc.)

Puppet

Big data toolbox.

what’s inside:

3 nodes hdp cluster, based on docker containers installed with ambari bluprints:
- Apache Ambari
- Apache Falcon
- Apache Hadoop
- Apache HBase
- Apache Hive
- Apache Oozie
- Apache Pig
- Apache Storm
- Apache Tez
Other software:
- Apache Accumulo
- Apache Spark
- Gluster FS
- Hue

How to use:

Install virtualbox
Install vagrant

$ git clone https://github.com/petro-rudenko/bigdata-toolbox
$ cd bigdata-toolbox
$ vagrant up

How to customize:

Edit puppet/modules/install/files/3-nodes-bluprint-cluster.json

How to add custom software:

TODO:

Migrate from puppet to master-client (probably salt) in order to sync install 3-party software on nodes.
Cloud deployment
Save states of containers to not rebuild whole system.
Configurable nodes count and custom bluprints.
Move to ambari shell.