/dummer

Generates dummy log data

Primary LanguageRubyMIT LicenseMIT

Dummer

Dummer is a set of tools to generate dummy log data. I made this for Fluentd benchmark.

This gem includes three executable commands

  1. dummer
  2. dummer_simple
  3. dummer_yes

Installation

Add this line to your application's Gemfile:

gem 'dummer'

And then execute:

$ bundle

Or install it yourself as:

$ gem install dummer

Run as

$ dummer -c dummer.conf
$ dummer_simple [options]
$ dummer_yes [options]

dummer

dummer allows you to

  1. specify a rate of generating messages per second,
  2. determine a log format, and
  3. generate logs randomly

Usage (1) - Write to a file

Create a configuration file. A sample configuration is as follows:

# dummer.conf
configure 'sample' do
  output "dummy.log"
  rate 500
  delimiter "\t"
  labeled true
  field :id, type: :integer, countup: true, format: "%04d"
  field :time, type: :datetime, format: "[%Y-%m-%d %H:%M:%S]", random: false
  field :level, type: :string, any: %w[DEBUG INFO WARN ERROR]
  field :method, type: :string, any: %w[GET POST PUT]
  field :uri, type: :string, any: %w[/api/v1/people /api/v1/textdata /api/v1/messages]
  field :reqtime, type: :float, range: 0.1..5.0
  field :foobar, type: :string, length: 8
end 

Running

$ dummer -c dummer.conf

Outputs to the dummy.log (specified by output parameter) file like:

id:0422  time:[2013-11-19 02:34:58]  level:INFO  method:POST uri:/api/v1/textdata  reqtime:3.9726677258569842  foobar:LFK6XV1N
id:0423  time:[2013-11-19 02:34:58]  level:DEBUG method:GET  uri:/api/v1/people    reqtime:0.49912949125272277 foobar:DcOYrONH
id:0424  time:[2013-11-19 02:34:58]  level:WARN  method:POST uri:/api/v1/textdata  reqtime:2.930590441869852   foobar:XEZ5bQsh

Usage (2) - Post to Fluentd process

(experimental)

Create a configuration file. Assume that a fluentd process is running on localhost:24224. A sample configuration is as follows:

# dummer.conf
configure 'sample' do
  host "localhost" # define `host` and `port` instead of `output`
  port 24224
  rate 500
  tag type: :string, any: %w[raw.syslog raw.message raw.nginx] # configure tag
  field :id, type: :integer, countup: true, format: "%04d"
  field :level, type: :string, any: %w[DEBUG INFO WARN ERROR]
  field :method, type: :string, any: %w[GET POST PUT]
  field :uri, type: :string, any: %w[/api/v1/people /api/v1/textdata /api/v1/messages]
  field :reqtime, type: :float, range: 0.1..5.0
  field :foobar, type: :string, length: 8
end 

Running

$ dummer -c dummer.conf

Data is posted to fluentd process like (below is the fluentd log generated by out_stdout)

2014-01-31 00:55:32 +0900 raw.message: {"id":"1377","level":"INFO","method":"POST","uri":"/api/v1/people","reqtime":1.678867810409548,"foobar":"paOIWxhQ"}
2014-01-31 00:55:32 +0900 raw.syslog: {"id":"1378","level":"INFO","method":"GET","uri":"/api/v1/people","reqtime":4.8412816521873445,"foobar":"kUvnC0MK"}
2014-01-31 00:55:32 +0900 raw.message: {"id":"1379","level":"WARN","method":"GET","uri":"/api/v1/people","reqtime":3.584494903998221,"foobar":"KD78mpjX"}

CLI Options

You can specify some configuration parameters on CLI without writing them on a configuration file.

$ dummer help start
Usage:
  dummer start

Options:
  -c, [--config=CONFIG]            # Config file
                                   # Default: dummer.conf
  -r, [--rate=N]                   # Number of generating messages per second
  -o, [--output=OUTPUT]            # Output file
  -h, [--host=HOST]                # Host of fluentd process
  -p, [--port=N]                   # Port of fluentd process
  -m, [--message=MESSAGE]          # Output message
  -d, [--daemonize]                # Daemonize. Stop with `dummer stop`
  -w, [--workers=N]                # Number of parallels
      [--worker-type=WORKER_TYPE]
                                   # Default: process
  -p, [--pid-path=PID_PATH]
                                   # Default: dummer.pid

Configuration Parameters

Following parameters in the configuration file are available:

  • output

    Specify a filename to output, or IO object (STDOUT, STDERR)

  • host

    Post a data to a fluentd process on the specified host. Either of output or host can be specified.

  • port

    Post a data to a fluentd process on the specified post. Default is 24224.

  • rate

    Specify how many messages to generate per second. Default: 500 msgs / sec

  • workers

    Specify number of processes for parallel processing.

  • delimiter

    Specify the delimiter between each field. Default: "\t" (Tab)

  • labeled

    Whether add field name as a label or not. Default: true

  • label_delimiter

    Specify the delimiter between the label and the value. Default: ":" (column)

  • tag

    Define tag field to generate. This is effective only for posting data to fluentd process with host and port.

  • field

    Random field generator mode. Define data fields to generate. message and input options are ignored. See also Field Data Types section below.

  • message

    Specific message generation mode. See message.conf as an example. This mode works pretty fast because it does not require to generate values randomly.

  • input

    Messages taken from an input file mode. Use this if you want to write messages by reading lines of an input file in rotation. message option is ignored. See input.conf as an example. This mode also works fast.

Field Data Types

You can specify following data types to your tag and field parameters:

  • :datetime

    • :format

      You can specify format of datetime as %Y-%m-%d %H:%M:%S. See Time#strftime for details.

    • :random

      Generate datetime randomly. Default: false (Time.now)

    • :value

      You can specify a fixed Time object.

  • :string

    • :any

      You can specify an array of strings, then the generator picks one from them randomly

    • :length

      You can specify the length of string to generate randomly

    • :value

      You can specify a fixed string

  • :integer

    • :format

      You can specify a format of string as %03d.

    • :range

      You can specify a range of integers, then the generator picks one in the range (uniform) randomly

    • :countup

      Generate countup data. Default: false

    • :value

      You can specify a fixed integer

  • :float

    • :format

      You can specify a format of string as %03.1f.

    • :range

      You can specify a range of float numbers, then the generator picks one in the range (uniform) randomly

    • :value

      You can specify a fixed float number

dummer_simple

I created a simple version of dummer since dummer could not achieve the maximum system I/O throughputs because of its rich features. This simple version, dummer_simple could achieve the system I/O limit in my environment.

Sorry, but this simple script cannot post data to fluentd process, supports only writing to a file.

Usage

$ dummer_simple [options]

Options

Usage:
  dummer_simple

Options:
      [--sync]             # Set `IO#sync=true`
  -s, [--second=N]         # Duration of running in second
                           # Default: 1
  -p, [--parallel=N]       # Number of processes to run in parallel
                           # Default: 1
  -o, [--output=OUTPUT]    # Output file
                           # Default: dummy.log
  -i, [--input=INPUT]      # Input file (Output messages by reading lines of the file in rotation)
  -m, [--message=MESSAGE]  # Output message
                           # Default: time:2013-11-20 23:39:42 +0900    level:ERROR     method:POST     uri:/api/v1/people      reqtime:3.1983877060667103

dummer_yes

I created a wrapped version of yes command, dummer_yes, to confrim that dummer_simple achieves the maximum system I/O throughputs.

I do not use dummer_yes command anymore because I verified that dummer_simple achieves the I/O limit, but I will keep this command so that users can do verification experiments with it.

Usage

$ dummer_yes [options]

Options

Usage:
  dummer_yes

Options:
  -s, [--second=N]         # Duration of running in second
                           # Default: 1
  -p, [--parallel=N]       # Number of processes to run in parallel
                           # Default: 1
  -o, [--output=OUTPUT]    # Output file
                           # Default: dummy.log
  -m, [--message=MESSAGE]  # Output message
                           # Default: time:2013-11-20 23:39:42 +0900  level:ERROR method:POST uri:/api/v1/people  reqtime:3.1983877060667103

Relatives

There is a fluent-plugin-dummydata-producer, but I wanted to output dummy data to a log file, and I wanted a standalone separated tool to do benchmark.

Related Articles

ToDO

  1. write tests

Contributing

  1. Fork it
  2. Create your feature branch (git checkout -b my-new-feature)
  3. Commit your changes (git commit -am 'Add some feature')
  4. Push to the branch (git push origin my-new-feature)
  5. Create new Pull Request

Licenses

See LICENSE.txt