/delivery_boy

A simple way to publish messages to Kafka from Ruby applications

Primary LanguageRubyApache License 2.0Apache-2.0

DeliveryBoy

This library provides a dead easy way to start publishing messages to a Kafka cluster from your Ruby or Rails application!

Installation

Add this line to your application's Gemfile:

gem 'delivery_boy'

And then execute:

$ bundle

Or install it yourself as:

$ gem install delivery_boy

Usage

Once you've installed the gem, and assuming your Kafka broker is running on localhost, you can simply start publishing messages to Kafka directly from your Rails code:

# app/controllers/comments_controller.rb
class CommentsController < ApplicationController
  def create
    @comment = Comment.create!(params)

    # This will publish a JSON representation of the comment to the `comments` topic
    # in Kafka. Make sure to create the topic first, or this may fail.
    DeliveryBoy.deliver(comment.to_json, topic: "comments")
  end
end

The above example will block the server process until the message has been delivered. If you want deliveries to happen in the background in order to free up your server processes more quickly, call #deliver_async instead:

# app/controllers/comments_controller.rb
class CommentsController < ApplicationController
  def show
    @comment = Comment.find(params[:id])

    event = {
      name: "comment_viewed",
      data: {
        comment_id: @comment.id,
        user_id: current_user.id
      }
    }

    # By delivering messages asynchronously you free up your server processes faster.
    DeliveryBoy.deliver_async(event.to_json, topic: "activity")
  end
end

In addition to improving response time, delivering messages asynchronously also protects your application against Kafka availability issues -- if messages cannot be delivered, they'll be buffered for later and retried automatically.

A third method is to produce messages first (without delivering the messages to Kafka yet), and deliver them synchronously later.

 # app/controllers/comments_controller.rb
 class CommentsController < ApplicationController
   def create
     @comment = Comment.create!(params)

     event = {
       name: "comment_created",
       data: {
         comment_id: @comment.id
         user_id: current_user.id
       }
     }

     # This will queue the two messages in the internal buffer.
     DeliveryBoy.produce(comment.to_json, topic: "comments")
     DeliveryBoy.produce(event.to_json, topic: "activity")

     # This will deliver all messages in the buffer to Kafka.
     # This call is blocking.
     DeliveryBoy.deliver_messages
   end
 end

The methods deliver, deliver_async and produce take the following options:

  • topic – the Kafka topic that should be written to (required).
  • key – the key that should be set on the Kafka message (optional).
  • partition – a specific partition number that should be written to (optional).
  • partition_key – a string that can be used to deterministically select the partition that should be written to (optional).

Regarding partition and partition_key: if none are specified, DeliveryBoy will pick a partition at random. If you want to ensure that e.g. all messages related to a user always get written to the same partition, you can pass the user id to partition_key. Don't use partition directly unless you know what you're doing, since it requires you to know exactly how many partitions each topic has, which can change and cause you pain and misery. Just use partition_key or let DeliveryBoy choose at random.

Configuration

You configure DeliveryBoy in three different ways: in a YAML config file, in a Ruby config file, or by setting environment variables.

If you're using Rails, the fastest way to get started is to execute the following in your terminal:

$ bundle exec rails generate delivery_boy:install

This will create a config file at config/delivery_boy.yml with configurations for each of your Rails environments. Open that file in order to make changes.

Note that for all configuration variables, you can pass in an environment variable. These environment variables all take the form DELIVERY_BOY_X, where X is the upper-case configuration variable name, e.g. DELIVERY_BOY_CLIENT_ID.

You can also configure DeliveryBoy in Ruby if you prefer that. By default, the file config/delivery_boy.rb is loaded if present, but you can do this from anywhere – just call DeliveryBoy.configure like so:

DeliveryBoy.configure do |config|
  config.client_id = "yolo"
  # ...
end

The following configuration variables can be set:

Basic

brokers

A list of Kafka brokers that should be used to initialize the client. Defaults to just localhost:9092 in development and test, but in production you need to pass a list of hostname:port strings.

client_id

This is how the client will identify itself to the Kafka brokers. Default is delivery_boy.

log_level

The log level for the logger.

Message delivery

delivery_interval

The number of seconds between background message deliveries. Default is 10 seconds. Disable timer-based background deliveries by setting this to 0.

delivery_threshold

The number of buffered messages that will trigger a background message delivery. Default is 100 messages. Disable buffer size based background deliveries by setting this to 0.

required_acks

The number of Kafka replicas that must acknowledge messages before they're considered as successfully written. Default is all replicas.

See ruby-kafka for more information.

ack_timeout

A timeout executed by a broker when the client is sending messages to it. It defines the number of seconds the broker should wait for replicas to acknowledge the write before responding to the client with an error. As such, it relates to the required_acks setting. It should be set lower than socket_timeout.

max_retries

The number of retries when attempting to deliver messages. The default is 2, so 3 attempts in total, but you can configure a higher or lower number.

retry_backoff

The number of seconds to wait after a failed attempt to send messages to a Kafka broker before retrying. The max_retries setting defines the maximum number of retries to attempt, and so the total duration could be up to max_retries * retry_backoff seconds. The timeout can be arbitrarily long, and shouldn't be too short: if a broker goes down its partitions will be handed off to another broker, and that can take tens of seconds.

Compression

See ruby-kafka for more information.

compression_codec

The codec used to compress messages. Must be either snappy or gzip.

compression_threshold

The minimum number of messages that must be buffered before compression is attempted. By default only one message is required. Only relevant if compression_codec is set.

Network

connect_timeout

The number of seconds to wait while connecting to a broker for the first time. When the Kafka library is initialized, it needs to connect to at least one host in brokers in order to discover the Kafka cluster. Each host is tried until there's one that works. Usually that means the first one, but if your entire cluster is down, or there's a network partition, you could wait up to n * connect_timeout seconds, where n is the number of hostnames in brokers.

socket_timeout

Timeout when reading data from a socket connection to a Kafka broker. Must be larger than ack_timeout or you risk killing the socket before the broker has time to acknowledge your messages.

Buffering

When using the asynhronous API, messages are buffered in a background thread and delivered to Kafka based on the configured delivery policy. Because of this, problems that hinder the delivery of messages can cause the buffer to grow. In order to avoid unlimited buffer growth that would risk affecting the host application, some limits are put in place. After the buffer reaches the maximum size allowed, calling DeliveryBoy.deliver_async will raise Kafka::BufferOverflow.

max_buffer_bytesize

The maximum number of bytes allowed in the buffer before new messages are rejected.

max_buffer_size

The maximum number of messages allowed in the buffer before new messages are rejected.

max_queue_size

The maximum number of messages allowed in the queue before new messages are rejected. The queue is used to ferry messages from the foreground threads of your application to the background thread that buffers and delivers messages. You typically only want to increase this number if you have a very high throughput of messages and the background thread can't keep up with spikes in throughput.

SSL Authentication and authorization

See ruby-kafka for more information.

ssl_ca_cert

A PEM encoded CA cert, or an Array of PEM encoded CA certs, to use with an SSL connection.

ssl_ca_cert_file_path

The path to a valid SSL certificate authority file.

ssl_client_cert

A PEM encoded client cert to use with an SSL connection. Must be used in combination with ssl_client_cert_key.

ssl_client_cert_key

A PEM encoded client cert key to use with an SSL connection. Must be used in combination with ssl_client_cert.

ssl_client_cert_key_password

The password required to read the ssl_client_cert_key. Must be used in combination with ssl_client_cert_key.

SASL Authentication and authorization

See ruby-kafka for more information.

Use it through GSSAPI, PLAIN or OAUTHBEARER.

sasl_gssapi_principal

The GSSAPI principal.

sasl_gssapi_keytab

Optional GSSAPI keytab.

sasl_plain_authzid

The authorization identity to use.

sasl_plain_username

The username used to authenticate.

sasl_plain_password

The password used to authenticate.

sasl_oauth_token_provider

A instance of a class which implements the token method. As described in ruby-kafka

class TokenProvider
  def token
    "oauth-token"
  end
end

DeliveryBoy.configure do |config|
  config.sasl_oauth_token_provider = TokenProvider.new
  config.ssl_ca_certs_from_system = true
end

AWS MSK IAM Authentication and Authorization

sasl_aws_msk_iam_access_key_id

The AWS IAM access key. Required.

sasl_aws_msk_iam_secret_key_id

The AWS IAM secret access key. Required.

sasl_aws_msk_iam_aws_region

The AWS region. Required.

sasl_aws_msk_iam_session_token

The session token. This value can be optional.

Examples

Using a role arn and web identity token to generate temporary credentials:

require "aws-sdk-core"
require "delivery_boy"

role = Aws::AssumeRoleWebIdentityCredentials.new(
  role_arn: ENV["AWS_ROLE_ARN"],
  web_identity_token_file: ENV["AWS_WEB_IDENTITY_TOKEN_FILE"]
)

DeliveryBoy.configure do |c|
  c.sasl_aws_msk_iam_access_key_id = role.credentials.access_key_id
  c.sasl_aws_msk_iam_secret_key_id = role.credentials.secret_access_key
  c.sasl_aws_msk_iam_session_token = role.credentials.session_token
  c.sasl_aws_msk_iam_aws_region    = ENV["AWS_REGION"]
  c.ssl_ca_certs_from_system       = true
end

Testing

DeliveryBoy provides a test mode out of the box. When this mode is enabled, messages will be stored in memory rather than being sent to Kafka. If you use RSpec, enabling test mode is as easy as adding this to your spec helper:

# spec/spec_helper.rb
require "delivery_boy/rspec"

Now your application can use DeliveryBoy in tests without connecting to an actual Kafka cluster. Asserting that messages have been delivered is simple:

describe PostsController do
  describe "#show" do
    it "emits an event to Kafka" do
      post = Post.create!(body: "hello")

      get :show, id: post.id

      # Use this API to extract all messages written to a Kafka topic.
      messages = DeliveryBoy.testing.messages_for("post_views")

      expect(messages.count).to eq 1

      # In addition to #value, you can also pull out #key and #partition_key.
      event = JSON.parse(messages.first.value)

      expect(event["post_id"]).to eq post.id
    end
  end
end

This takes care of clearing messages after each example, as well.

If you're not using RSpec, you can easily replicate the functionality yourself. Call DeliveryBoy.test_mode! at load time, and make sure that DeliveryBoy.testing.clear is called after each test.

Instrumentation & monitoring

Since DeliveryBoy is just an opinionated API on top of ruby-kafka, you can use all the instrumentation made available by that library. You can also use the existing monitoring solutions that integrate with various monitoring services.

Contributing

Bug reports and pull requests are welcome on GitHub. Feel free to join our Slack team and ask how best to contribute!

Support and Discussion

If you've discovered a bug, please file a Github issue, and make sure to include all the relevant information, including the version of DeliveryBoy, ruby-kafka, and Kafka that you're using.

If you have other questions, or would like to discuss best practises, how to contribute to the project, or any other ruby-kafka related topic, join our Slack team!

Copyright and license

Copyright 2017 Zendesk, Inc.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License.

You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.