Ruminant

Ruminant queries an ElasticSearch database, processes the results and feeds them as time series to an Influx database. ETL for a rather specific use case, basically.

How it works

Processing data with Ruminant performs a few steps that you should understand:

Find out where to start: First the targeted time series in the Influx Database is queried in for its last marker timestamp. This timestamp indicates at what time the last run of Ruminant was performed and is used as starting point for this run...

Fetch the data from ElasticSearch: A query provided is executed and its result is prepared to be processed. A query can be executed in two manners:

If no sampler configuration is provided, the query is executed once. This is kind of execution is a good fit if you can extract timestamps for your time series from the results of your ElasticSeach query, eg. if the query performs a date_histogram aggregation for example.
If a query does not contain a date_histogram aggregation and needs to be executed once per point in your time series, a sampler configuration can be passed. This allows to run the same query multiple times with incrementing timestamps.

This step in known as regurgitate in the ruminant jargon.

Process the results and build time series: ElasticSearch returns the resuls of the query as JSON data. with simple expressions, Ruminant allows you to iterate over these results and lets you indicate where in the JSON information can be found that should be stored with the time series.

This step in known as ruminate in the ruminant jargon.

Persist time series: The set of data points created in the last step is then saved to the Influx database and series specified in your configuration. Also, a new marker timestamp is written that indicates the the new latest point in your series to indicate where to start on the next run.

This step in known as gulp in the ruminant jargon.

Usage

Please note: Do not really use this just yet. So far, the project was a quick shot to solve a particular problem. The source itself has neither documentation nor tests to ensure that all the stuff works an expected... Those topics will be addressed soon.

Install via:

go get -u github.com/unprofession-al/ruminant

Run via:

ruminant -h
Feed data from ElasticSearch to InfluxDB

Usage:
  ruminant [command]

Available Commands:
  burp        Test the query and iterator
  config      Prints the config used to the stdout
  gulp        Feed data to Infux DB
  init        Creates the Database if required and sets a start date
  poop        Dump data from Infux DB to stdout
  vomit       Throw up to standart output

Flags:
  -c, --cfg string   config file (default is $HOME/ruminant.yaml) (default "$HOME/ruminant.yaml")

Annotated Configuration

Jump to the examples to find some annotated configuration files.

Credits

Main third party components other than the go standard library are:

bianhezhen/ruminant

Ruminant

How it works

Usage

Annotated Configuration

Credits