/puppet-storm

Wirbelsturm-compatible Puppet module to deploy Storm 0.9+ clusters

Primary LanguageRubyOtherNOASSERTION

puppet-storm Build Status

Wirbelsturm-compatible Puppet module to deploy Storm 0.9+ clusters.

You can use this Puppet module to deploy Storm to physical and virtual machines, for instance via your existing internal or cloud-based Puppet infrastructure and via a tool such as Vagrant for local and remote deployments.


Table of Contents


Quick start

See section Usage below.

Features

  • Supports Storm 0.9+, i.e. the latest stable release version.
  • Only supports Netty as Storm's messaging backend, which is the default backend since Storm 0.9. Support for the legacy ZeroMQ backend was deliberately removed from this module.
  • Decouples code (Puppet manifests) from configuration data (Hiera) through the use of Puppet parameterized classes, i.e. class parameters. Hence you should use Hiera to control how Storm is deployed and to which machines.
  • Supports RHEL OS family (e.g. RHEL 6, CentOS 6, Amazon Linux).
    • Code contributions to support additional OS families are welcome!
  • Storm is run under process supervision via supervisord version 3.0+.

Requirements and assumptions

  • A Storm cluster requires a ZooKeeper quorum (1, 3, 5, or more ZooKeeper instances) for proper functioning. Take a look at puppet-zookeeper to deploy such a ZooKeeper quorum for use with Storm.

  • This module requires that the target machines to which you are deploying Storm have yum repositories configured for pulling the Storm package (i.e. RPM).

  • This module requires that the target machines have a Java JRE/JDK installed (e.g. via a separate Puppet module such as puppetlabs-java). You may also want to make sure that the Java package is installed before Storm to prevent startup problems.

    • Because different teams may have different approaches to install "base" packages such as Java, this module does intentionally not puppet-require Java directly.
    • Note: Based on our own experience we strongly discourage the use of OpenJDK 6. We run into many weird errors with it. If you need Java 6 use Oracle/Sun Java 6.
  • This module requires the following additional Puppet modules:

    It is recommended that you add these modules to your Puppet setup via librarian-puppet. See the Puppetfile snippet in section Installation below for a starting example.

  • When using Vagrant: Depending on your Vagrant box (image) you may need to manually configure/disable firewall settings -- otherwise machines may not be able to talk to each other. One option to manage firewall settings is via puppetlabs-firewall.

Installation

It is recommended to use librarian-puppet to add this module to your Puppet setup.

Add the following lines to your Puppetfile:

# Add the stdlib dependency as hosted on public Puppet Forge.
#
# We intentionally do not include the stdlib dependency in our Modulefile to make it easier for users who decided to
# use internal copies of stdlib so that their deployments are not coupled to the availability of PuppetForge.  While
# there are tools such as puppet-library for hosting internal forges or for proxying to the public forge, not everyone
# is actually using those tools.
mod 'puppetlabs/stdlib'

# Add the puppet-storm module
mod 'storm',
  :git => 'https://github.com/miguno/puppet-storm.git'

# Add the puppet-supervisor module dependency
mod 'supervisor',
  :git => 'https://github.com/miguno/puppet-supervisor.git'

Then use librarian-puppet to install (or update) the Puppet modules.

Configuration

  • See init.pp for the list of currently supported configuration parameters. These should be self-explanatory.
  • See params.pp for the default values of those configuration parameters.

Of special note is the class parameter $config_map: You can use this parameter to "inject" arbitrary Storm config settings via Hiera/YAML into the Storm configuration file (default name: storm.yaml). However you should not re-define config settings via $config_map that already have explicit Puppet class parameters (such as $nimbus_host, $worker_childopts). See the examples below for more information on $config_map usage.

Usage

IMPORTANT: Make sure you read and follow the Requirements and assumptions section above. Otherwise the examples below will of course not work.

Configuration examples

Using Hiera

A "full" single-node example that includes the deployment of supervisord via puppet-supervisor and ZooKeeper via puppet-zookeeper. Here, both ZooKeeper and Storm (Logviewer, Nimbus, Supervisor, UI, DRPC) are running on the same machine called stormsingle1. That's a nice setup for your local development laptop or CI server, for instance.

---
classes:
  - storm::drpc
  - storm::logviewer
  - storm::nimbus
  - storm::supervisor
  - storm::ui
  - supervisor
  - zookeeper::service

# Custom Storm settings
storm::nimbus_host: 'stormsingle1'
storm::zookeeper_servers:
  - 'stormsingle1'
storm::drpc_childopts:       '-Xmx256m -Djava.net.preferIPv4Stack=true'
storm::logviewer_childopts:  '-Xmx128m -Djava.net.preferIPv4Stack=true'
storm::nimbus_childopts:     '-Xmx256m -Djava.net.preferIPv4Stack=true'
storm::ui_childopts:         '-Xmx256m -Djava.net.preferIPv4Stack=true'
storm::supervisor_childopts: '-Xmx256m -Djava.net.preferIPv4Stack=true'
storm::worker_childopts:     '-Xmx256m -Djava.net.preferIPv4Stack=true'
storm::supervisor_slots_ports:
  - 6700
  - 6701
storm::storm_messaging_transport: "backtype.storm.messaging.netty.Context"
storm::config_map:
  nimbus.thrift.threads: 12
  storm.messaging.netty.server_worker_threads: 1
  storm.messaging.netty.client_worker_threads: 1
  storm.messaging.netty.buffer_size: 5242880
  storm.messaging.netty.max_retries: 100
  storm.messaging.netty.max_wait_ms: 1000
  storm.messaging.netty.min_wait_ms: 100
storm::drpc_servers:
  - 'stormsingle1'

Of course you can (and normally will) use multiple Storm nodes. Here, you will typically run Storm Nimbus and Storm UI on the "master" machine, and a Storm Supervisor daemon on each of the "slave" machines in the Storm cluster. Also, you will typically have a dedicated ZooKeeper quorum. Note that in small deployments you can alternatively also opt to use only a single ZooKeeper instance, which is co-located with the Storm Nimbus/UI daemons on the same master machine.

Storm master node example, assuming the master node is called nimbus1 and the ZooKeeper server is called zookeeper1:

---
classes:
  - storm::nimbus
  - storm::ui
  - supervisor

## Custom Storm settings
storm::zookeeper_servers:
  - 'zookeeper1'
storm::nimbus_host: 'nimbus1'
storm::nimbus_childopts:     '-Xmx1024m -Djava.net.preferIPv4Stack=true'
storm::ui_childopts:         '-Xmx512m  -Djava.net.preferIPv4Stack=true'
# Add shell environment variables to the environment of the Nimbus process
storm::nimbus::service_environment: 'FOO="bar",HELLO="world"'

Storm slave node example:

---
classes:
  - storm::logviewer
  - storm::supervisor
  - supervisor

## Custom Storm settings
storm::zookeeper_servers:
  - 'zookeeper1'
storm::logviewer_childopts:  '-Xmx128m -Djava.net.preferIPv4Stack=true'
storm::nimbus_host: 'nimbus1'
storm::supervisor_childopts: '-Xmx256m  -Djava.net.preferIPv4Stack=true'
storm::worker_childopts:     '-Xmx1024m -Djava.net.preferIPv4Stack=true'
storm::supervisor_slots_ports:
  - 6700
  - 6701
  - 6702
  - 6703

Using Puppet manifests

Note: It is recommended to use Hiera to control deployments instead of using this module in your Puppet manifests directly.

TBD

Service management

To manually start, stop, restart, or check the status of the Storm daemons, respectively:

$ sudo supervisorctl [start|stop|restart|status] [storm-nimbus|storm-supervisor|storm-ui]

Example:

$ sudo supervisorctl status
storm-drpc                       RUNNING    pid 7490, uptime 0:05:34
storm-logviewer                  RUNNING    pid 7491, uptime 0:05:17
storm-nimbus                     RUNNING    pid 7491, uptime 0:05:12
storm-ui                         RUNNING    pid 7421, uptime 0:05:26
storm-supervisor                 RUNNING    pid 7507, uptime 0:05:03

Log files

Note: The locations below may be different depending on the Storm RPM you are actually using.

  • Storm log files: /var/log/storm/*
  • Supervisord log files related to Storm processes:
    • /var/log/supervisor/storm-ui/storm-drpc.out
    • /var/log/supervisor/storm-ui/storm-drpc.err
    • /var/log/supervisor/storm-nimbus/storm-nimbus.out
    • /var/log/supervisor/storm-nimbus/storm-nimbus.err
    • /var/log/supervisor/storm-supervisor/storm-supervisor.out
    • /var/log/supervisor/storm-supervisor/storm-supervisor.err
    • /var/log/supervisor/storm-ui/storm-ui.out
    • /var/log/supervisor/storm-ui/storm-ui.err
  • Supervisord main log file: /var/log/supervisor/supervisord.log

Development

It is recommended run the bootstrap script after a fresh checkout:

$ ./bootstrap

You have access to a bunch of rake commands to help you with module development and testing:

$ bundle exec rake -T
rake acceptance          # Run acceptance tests
rake build               # Build puppet module package
rake clean               # Clean a built module package
rake coverage            # Generate code coverage information
rake help                # Display the list of available rake tasks
rake lint                # Check puppet manifests with puppet-lint / Run puppet-lint
rake module:bump         # Bump module version to the next minor
rake module:bump_commit  # Bump version and git commit
rake module:clean        # Runs clean again
rake module:push         # Push module to the Puppet Forge
rake module:release      # Release the Puppet module, doing a clean, build, tag, push, bump_commit and git push
rake module:tag          # Git tag with the current module version
rake spec                # Run spec tests in a clean fixtures directory
rake spec_clean          # Clean up the fixtures directory
rake spec_prep           # Create the fixtures directory
rake spec_standalone     # Run spec tests on an existing fixtures directory
rake syntax              # Syntax check Puppet manifests and templates
rake syntax:hiera        # Syntax check Hiera config files
rake syntax:manifests    # Syntax check Puppet manifests
rake syntax:templates    # Syntax check Puppet templates
rake test                # Run syntax, lint, and spec tests

Of particular interest are:

  • rake test -- run syntax, lint, and spec tests
  • rake syntax -- to check you have valid Puppet and Ruby ERB syntax
  • rake lint -- checks against the Puppet Style Guide
  • rake spec -- run unit tests

TODO

  • Restrict disk space used by logviewer log files.
  • Enhance in-line documentation of Puppet manifests.
  • Add more unit tests and specs.
  • Add rollback/remove functionality to completely purge Storm related packages and configuration files from a machine.

Change log

See CHANGELOG.

Contributing to puppet-storm

Code contributions, bug reports, feature requests etc. are all welcome.

If you are new to GitHub please read Contributing to a project for how to send patches and pull requests to puppet-storm.

License

Copyright © 2014 Michael G. Noll

See LICENSE for licensing information.

References

Puppet modules similar to this module:

  • wikimedia/puppet-kafka -- focuses on Debian as the target OS, and apparently also supports Kafka mirroring and jmxtrans monitoring (the latter for sending JVM and Kafka broker metrics to tools such as Ganglia or Graphite)

The test setup of this module was derived from: