Amazon Kinesis Output for Telegraf

This is plugin makes use of the Telegraf Output Exec plugin. It will batch up all of the Points in as few Put request to Kinesis. Data can also be compressed to further reduce the size. This should save the number of API requests by a considerable level.

It expects that the configuration for the output ship data in JSON. Here is an example configuration of the Exec Output.

[[outputs.exec]]
  command = ["./telegraf_kinesis_output"]
  data_format = "json"
  json_timestamp_units = "1ms"

We have worked hard to keep the code base and configuration as similar to telegraf as we can. We also try to support the same features as telegraf, such as data formatting for output aka Serializers and Encoders.

Encoders are extracted from telegraf so may have a when being introduced into this plugin. This is due to the encoders being an internal package in telegraf.

About Kinesis

This is not the place to document all of the various Kinesis terms however it maybe useful for users to review Amazons official documentation which is available here.

Amazon Authentication

This plugin uses a credential chain for Authentication with the Kinesis API endpoint. In the following order the plugin will attempt to authenticate.

Assumed credentials via STS if role_arn attribute is specified (source credentials are evaluated from subsequent rules)
Explicit credentials from access_key, secret_key, and token attributes
Shared profile from profile attribute
Environment Variables
Shared Credentials
EC2 Instance Profile

Example Configuration

  access_key = "AWSKEYVALUE"
  secret_key = "AWSSecretKeyValue"
  region = "eu-west-1"
  streamname = "KinesisStreamName"
  aggregate_metrics = true
  content_encoding = "gzip"
  partition = { method = "random" }
  debug = true

Config

AWS Configration

The following AWS configuration variables are available and map directly to the normal AWS settings. If you don't know what they are then you most likely don't need to touch them.

access_key
secret_key
role_arn
profile
shared_credential_file
token
endpoint_url

For this output plugin to function correctly the following variables must be configured.

region
streamname

region

The region is the Amazon region that you wish to connect to. Examples include but are not limited to

eu-west-1
us-west-2
us-east-1
ap-southeast-1
ap-southeast-2

streamname

The streamname is used by the plugin to ensure that data is sent to the correct Kinesis stream. It is important to note that the stream MUST be pre-configured for this plugin to function correctly. If the stream does not exist the plugin will result in telegraf exiting with an exit code of 1.

partition

This is used to group data within a stream. Currently four methods are supported: random, static, tag or measurement

random

This will generate a UUIDv4 for each metric to spread them across shards. Any guarantee of ordering is lost with this method

static

This uses a static string as a partitionkey. All metrics will be mapped to the same shard which may limit throughput.

tag

This will take the value of the specified tag from each metric as the paritionKey. If the tag is not found the default value will be used or telegraf if unspecified

measurement

This will use the measurement's name as the partitionKey.

format

The format configuration value has been designated to allow people to change the format of the Point as written to Kinesis. Right now there are two supported formats string and custom.

string

String is defined using the default Point.String() value and translated to []byte for the Kinesis stream.

custom

Custom is a string defined by a number of values in the FormatMetric() function.

aggregate_metrics

This will make the plugin gather the metrics and send them as blocks of metrics in Kinesis records. The number of put requests depends on a few factors.

If a random key is in use then a block for each shard in the stream will be created unless there isn't enough metrics then as many blocks as metrics.
Each record will be 1020kb in size + partition key

content_encoding

content_encoding can be anything that telegraf supports.

Examples below

gzip
snappy

gzip

This will make the plugin compress the data using GZip before the data is shipped to Kinesis. GZip is slower than snappy but generally fast enough and gives much better compression. Use GZip in most cases.

snappy

This will make the plugin compress the data using Google's Snappy compression before the data is shipped to Kinesis. Snappy is much quicker and would be used if you are taking too long to compress and write before the next flush interval.

debug

Prints debugging data into the logs.

Output Data Formatting aka Serializers

There is a minor configuration file change here from telegraf. The formatting of data needs to be under its own table. However all options currently supported by telegraf are supported here.

All output data formatting is stored under the heading [formatting].

See Telegraf Serializers for more information about the serializers.

Example below

region = "eu-west-1"
streamname = "RandyTest"
override_shard_count = 2
content_encoding = "gzip"
[partition]
  method = "random"
# See below the table and format options.
[formatting]
  data_format = "json"

nvmlabs/telegraf-output-kinesis