puppet-kafka

Wirbelsturm-compatible Puppet module to deploy Kafka 0.8+ servers/brokers.

You can use this Puppet module to deploy Kafka to physical and virtual machines, for instance via your existing internal or cloud-based Puppet infrastructure and via a tool such as Vagrant for local and remote deployments.

Table of Contents

Quick start
Features
Requirements and assumptions
Installation
Configuration
Usage
Custom ZooKeeper chroot (experimental)
Development
TODO
Change log
Contributing
License
References

Quick start

See section Usage below.

Features

Supports Kafka 0.8+, i.e. the latest stable release version.
Decouples code (Puppet manifests) from configuration data (Hiera) through the use of Puppet parameterized classes, i.e. class parameters. Hence you should use Hiera to control how Kafka is deployed and to which machines.
Supports RHEL OS family (e.g. RHEL 6, CentOS 6, Amazon Linux).
- Code contributions to support additional OS families are welcome!
Supports tuning of system-level configuration such as the maximum number of open files (cf. /etc/security/limits.conf) to optimize the performance of your Kafka deployments.
Kafka is run under process supervision via supervisord version 3.0+.

Requirements and assumptions

A Kafka cluster requires a ZooKeeper quorum (1, 3, 5, or more ZooKeeper instances) for proper functioning. Take a look at puppet-zookeeper to deploy such a ZooKeeper quorum for use with Kafka.
This module requires that the target machines to which you are deploying Kafka have yum repositories configured for pulling the Kafka package (i.e. RPM).
- We provide wirbelsturm-rpm-kafka so that you can conveniently build such an RPM yourself.
- Because we run Kafka via supervisord through puppet-supervisor, the supervisord RPM must be available, too. See puppet-supervisor for details.
This module requires that the target machines have a Java JRE/JDK installed (e.g. via a separate Puppet module such as puppetlabs-java). You may also want to make sure that the Java package is installed before Kafka to prevent startup problems.
- Because different teams may have different approaches to install "base" packages such as Java, this module does intentionally not puppet-require Java directly.
- Take a look at LinkedIn's Java setup for Kafka.
This module requires the following additional Puppet modules:
It is recommended that you add these modules to your Puppet setup via librarian-puppet. See the Puppetfile snippet in section Installation below for a starting example.
When using Vagrant: Depending on your Vagrant box (image) you may need to manually configure/disable firewall settings -- otherwise machines may not be able to talk to each other. One option to manage firewall settings is via puppetlabs-firewall.

Installation

It is recommended to use librarian-puppet to add this module to your Puppet setup.

Add the following lines to your Puppetfile:

# Add the stdlib dependency as hosted on public Puppet Forge.
#
# We intentionally do not include the stdlib dependency in our Modulefile to make it easier for users who decided to
# use internal copies of stdlib so that their deployments are not coupled to the availability of PuppetForge.  While
# there are tools such as puppet-library for hosting internal forges or for proxying to the public forge, not everyone
# is actually using those tools.
mod 'puppetlabs/stdlib', '>= 4.1.0'

# Add the puppet-kafka module
mod 'kafka',
  :git => 'https://github.com/miguno/puppet-kafka.git'

# Add the puppet-limits and puppet-supervisor module dependencies
mod 'limits',
  :git => 'https://github.com/miguno/puppet-limits.git'

mod 'supervisor',
  :git => 'https://github.com/miguno/puppet-supervisor.git'

Then use librarian-puppet to install (or update) the Puppet modules.

Configuration

See init.pp and broker.pp for the list of currently supported configuration parameters. These should be self-explanatory.
See params.pp for the default values of those configuration parameters.

Of special note is the class parameter $config_map: You can use this parameter to "inject" arbitrary Kafka config settings via Hiera/YAML into the Kafka broker configuration file (default name: server.properties). However you should not re-define config settings via $config_map that already have explicit Puppet class parameters (such as $broker_id). See the examples below for more information on $config_map usage.

Usage

IMPORTANT: Make sure you read and follow the Requirements and assumptions section above. Otherwise the examples below will of course not work.

Configuration examples

Using Hiera

A "full" single-node example that includes the deployment of supervisord via puppet-supervisor and ZooKeeper via puppet-zookeeper. Here, both ZooKeeper and Kafka are running on the same machine. The Kafka broker will listen on port 9092/tcp and will connect to the ZooKeeper server running at localhost:2181. That's a nice setup for your local development laptop or CI server, for instance.

---
classes:
  - kafka::service
  - supervisor
  - zookeeper::service

A more sophisticated example that overrides some of the default settings and also demonstrates the use of $config_map. In this example, the broker connects to the ZooKeeper server zookeeper1. Take a look at Kafka's Java/JVM configuration notes as well as recommended production configurations.

---
classes:
  - kafka::service
  - supervisor

## Kafka
kafka::broker_id: 0
kafka::config_map:
  log.roll.hours: 48
  log.retention.hours: 48
kafka::kafka_heap_opts: '-Xms2G -Xmx2G -XX:NewSize=256m -XX:MaxNewSize=256m'
kafka::kafka_opts: '-XX:CMSInitiatingOccupancyFraction=70 -XX:+PrintTenuringDistribution'
kafka::zookeeper_connect:
  - 'zookeeper1:2181'

# Optional: Manage /etc/security/limits.conf to tune the maximum number
# of open files, which is a typical setting you must change for Kafka
# production environments.  Default: false (do not manage)
kafka::limits_manage: true
kafka::limits_nofile: 65536

Using Puppet manifests

Note: It is recommended to use Hiera to control deployments instead of using this module in your Puppet manifests directly.

TBD

Service management

To manually start, stop, restart, or check the status of the Kafka broker service, respectively:

$ sudo supervisorctl [start|stop|restart|status] kafka-broker

Example:

$ sudo supervisorctl status
kafka-broker                          RUNNING    pid 16461, uptime 3 days, 09:22:38

Log files

Note: The locations below may be different depending on the Kafka RPM you are actually using.

Kafka log files: /var/log/kafka/*.log
Supervisord log files related to Kafka processes:
- /var/log/supervisor/kafka-broker/kafka-broker.out
- /var/log/supervisor/kafka-broker/kafka-broker.err
Supervisord main log file: /var/log/supervisor/supervisord.log

Custom ZooKeeper chroot (experimental)

Kafka supports custom ZooKeeper chroots, which is useful for multi-tenant ZooKeeper setups. This Puppet module has experimental support for this feature.

Creating the chroot

If Kafka will share a ZooKeeper cluster with other users, you might want to create a znode in ZooKeeper in which to store the data of your Kafka cluster.

First, you must create the znode manually yourself. You can use zkCli.sh that ships with ZooKeeper, or you can use the Kafka built-in zookeeper-shell. The following example creates the znode /my_kafka.

$ kafka zookeeper-shell <zookeeper_host>:2182
Connecting to kraken-zookeeper
Welcome to ZooKeeper!
JLine support is enabled

WATCHER::

WatchedEvent state:SyncConnected type:None path:null
[zk: kraken-zookeeper(CONNECTED) 0] create /my_kafka kafka
Created /my_kafka

You can use whatever chroot znode path you like. The second argument (data) is arbitrary. In this example we used 'kafka'.

Configuring Kafka to use the ZooKeeper chroot

When configuring the ZooKeeper connection string you must only add the custom chroot to the last entry in the zookeeper_connect array.

# Irrelevant config settings have been omitted/snipped
kafka::brokers:
  broker1:
    # WRONG!
    #
    # This Hiera configuration is the same as if you had added the following (incorrect) setting
    # to the normal Kafka configuration file `config/server.properties`:
    #
    #    zookeeper.connect=zkserver1:2181/my_kafka,zkserver2:2181/my_kafka
    #
    zookeeper_connect:
      - 'zkserver1:2181/my_kafka'
      - 'zkserver2:2181/my_kafka'

    # CORRECT
    #
    # This Hiera configuration is the same as if you had added the following (correct) setting
    # to the normal Kafka configuration file `config/server.properties`:
    #
    #    zookeeper.connect=zkserver1:2181,zkserver2:2181/my_kafka
    #
    zookeeper_connect:
      - 'zkserver1:2181'
      - 'zkserver2:2181/my_kafka'

Development

It is recommended run the bootstrap script after a fresh checkout:

$ ./bootstrap

You have access to a bunch of rake commands to help you with module development and testing:

$ bundle exec rake -T
rake acceptance          # Run acceptance tests
rake build               # Build puppet module package
rake clean               # Clean a built module package
rake coverage            # Generate code coverage information
rake help                # Display the list of available rake tasks
rake lint                # Check puppet manifests with puppet-lint / Run puppet-lint
rake module:bump         # Bump module version to the next minor
rake module:bump_commit  # Bump version and git commit
rake module:clean        # Runs clean again
rake module:push         # Push module to the Puppet Forge
rake module:release      # Release the Puppet module, doing a clean, build, tag, push, bump_commit and git push
rake module:tag          # Git tag with the current module version
rake spec                # Run spec tests in a clean fixtures directory
rake spec_clean          # Clean up the fixtures directory
rake spec_prep           # Create the fixtures directory
rake spec_standalone     # Run spec tests on an existing fixtures directory
rake syntax              # Syntax check Puppet manifests and templates
rake syntax:hiera        # Syntax check Hiera config files
rake syntax:manifests    # Syntax check Puppet manifests
rake syntax:templates    # Syntax check Puppet templates
rake test                # Run syntax, lint, and spec tests

Of particular interest are:

rake test -- run syntax, lint, and spec tests
rake syntax -- to check you have valid Puppet and Ruby ERB syntax
rake lint -- checks against the Puppet Style Guide
rake spec -- run unit tests

TODO

Enhance in-line documentation of Puppet manifests.
Add more unit tests and specs.
Add rollback/remove functionality to completely purge Kafka related packages and configuration files from a machine.

Change log

See CHANGELOG.

Contributing to puppet-kafka

Code contributions, bug reports, feature requests etc. are all welcome.

If you are new to GitHub please read Contributing to a project for how to send patches and pull requests to puppet-kafka.

License

See LICENSE for licensing information.

References

Puppet modules similar to this module:

wikimedia/puppet-kafka -- focuses on Debian as the target OS, and apparently also supports Kafka mirroring and jmxtrans monitoring (the latter for sending JVM and Kafka broker metrics to tools such as Ganglia or Graphite)

The test setup of this module was derived from:

puppet-module-skeleton