/streamstash

Log aggregating, filtering, redirecting service

Primary LanguageJavaScriptMIT LicenseMIT

Build Status

StreamStash

streamstash is a log aggregating, filtering, redirecting service. A lightweight Node.js alternative to projects like logstash, flume, fluentd, etc.

Usage

I typically setup a separate repo with my config.js and package.json that lists streamstash as a dependency. Deploy that repo to my servers and run npm install. The last step is to run streamstash

<PROJECT SOURCE DIR>/node_modules/streamstash/bin/streamstash <PROJECT SOURCE DIR>/config.js

An example of this can be found here

Inputs

Inputs are things that slurp event data from different places and provides them to streamstash for filtering (by filters) and outputting (by outputs).

Inputs packaged with streamstash:

  • RELP: Provides an easy and reliable integration with rsyslog. Uses rsyslogs Reliable Event Logging Protocol. For more info see the relp webpage
  • StdIn: Takes data received from standard input and creates events for them

Example usage can be found in the examples folder

Filters

Filters are javascript functions that allow you to modify event data or control the flow of an event through the system.

The main reason this project exists was to provide users a "real" scripting language to use when working with event data. If you have ever tried using logstash you may have gotten irritated with trying to do anything more than basic data manipulation, this is mainly because you were working in almost ruby but not quite.

Every event will contain the following properties in the data object:

  • source: The input plugin that generated the event.
  • message: The event message.
  • timestamp: The time the event occurred or was received by the input plugin.

A simple filter example:

addFilter(function (event) {
    // Add a gotHere property to the event data
    event.data.gotHere = 'Yay!'

    // Allow the event to progress to the next filter or on to output plugins
    event.next()
})

A little more advanced, this one is named:

addFilter('cool', function (event) {
    // Drop all events with a 'stupid event' message, these events will never see an output plugin
    if (event.data.message == 'stupid event') {
        // Be sure to return anytime you may continue processing the event to avoid weird issues
        return event.cancel()
    }

    // Have any events with a 'high priority' message skip any other filters and go directly to output plugins
    if (event.data.message == 'high priority') {
        return event.complete()
    }

    // All other events get here
    event.data.superAwesome = 'sure is'

    // Want to rename a field to have a crazy character?
    event.data['@message'] = event.data.message
    delete event.data.message

    // Since this is the last thing in the filter there is no need to return
    event.next()
})

Filters get an integer name by default. If you want better error and telemetry reporting, give them a name.

Remember, this is all pure Node.js. You can do any crazy exotic thing you want. Just remember that the more you do the slower each event is processed.

Docker

Make sure to replace relp_basic.js with your own config.js in the Dockerfile CMD section. Also, move Dockerfile from the examples folder into home directory or fix file structure in your commands.

Building and running:

docker build -t streamstash .
docker run -p 9200:9200 -p 9300:9300 -p 5514:5514 streamstash

Outputs

Outputs are exactly what they sound like. The output an event to a place.

Outputs packaged with streamstash:

  • ElasticSearch: Outputs event data to your ElasticSearch cluster. Works great with kibana
  • StdOut: Writes event data to standard output

Example usage can be found in the examples folder

Telemetry

If enabled, streamstash will output interesting stats to statsite, statsd, or any other service that conforms to the statsd line protocol.

General stats

  • events.processing A gauge of how many events are currently being processed
  • events.total A gauge of how many events have been processed since the start of the current process
  • filter.<name> A timer of how long each event took in each filter. Typically a histogram is created from the data so you can see p99, p95, mean, max, etc of the time spent in each filter.

Some plugins may also emit stats.

RELP

  • <PLUGIN NAME>.connection A gauge of the number of current connections being handled.

Example usage can be found in the examples folder

TODO

  • Need to think about outputs for special events (send interesting thing to slack, email, etc)
  • Add some helpers for things like renaming fields in filters?