/bigdata-toolbox

Big data toolbox (Hadoop, Storm, Spark etc.)

Primary LanguagePuppet

Big data toolbox.

what’s inside:

  • 3 nodes hdp cluster, based on docker containers installed with ambari bluprints:
    • Apache Ambari
    • Apache Falcon
    • Apache Hadoop
    • Apache HBase
    • Apache Hive
    • Apache Oozie
    • Apache Pig
    • Apache Storm
    • Apache Tez
  • Other software:
    • Apache Accumulo
    • Apache Spark
    • Gluster FS
    • Hue

How to use:

  1. Install virtualbox
  2. Install vagrant
$ git clone https://github.com/petro-rudenko/bigdata-toolbox
$ cd bigdata-toolbox
$ vagrant up

How to customize:

Edit puppet/modules/install/files/3-nodes-bluprint-cluster.json

How to add custom software:

TODO:

  1. Migrate from puppet to master-client (probably salt) in order to sync install 3-party software on nodes.
  2. Cloud deployment
  3. Save states of containers to not rebuild whole system.
  4. Configurable nodes count and custom bluprints.
  5. Move to ambari shell.