Dummer is a set of tools to generate dummy log data. I made this for Fluentd benchmark.
This gem includes three executable commands
- dummer
- dummer_simple
- dummer_yes
Add this line to your application's Gemfile:
gem 'dummer'
And then execute:
$ bundle
Or install it yourself as:
$ gem install dummer
Run as
$ dummer -c dummer.conf
$ dummer_simple [options]
$ dummer_yes [options]
dummer
allows you to
- specify a rate of generating messages per second,
- determine a log format, and
- generate logs randomly
Create a configuration file. A sample configuration is as follows:
# dummer.conf
configure 'sample' do
output "dummy.log"
rate 500
delimiter "\t"
labeled true
field :id, type: :integer, countup: true, format: "%04d"
field :time, type: :datetime, format: "[%Y-%m-%d %H:%M:%S]", random: false
field :level, type: :string, any: %w[DEBUG INFO WARN ERROR]
field :method, type: :string, any: %w[GET POST PUT]
field :uri, type: :string, any: %w[/api/v1/people /api/v1/textdata /api/v1/messages]
field :reqtime, type: :float, range: 0.1..5.0
field :foobar, type: :string, length: 8
end
Running
$ dummer -c dummer.conf
Outputs to the dummy.log
(specified by output
parameter) file like:
id:0422 time:[2013-11-19 02:34:58] level:INFO method:POST uri:/api/v1/textdata reqtime:3.9726677258569842 foobar:LFK6XV1N
id:0423 time:[2013-11-19 02:34:58] level:DEBUG method:GET uri:/api/v1/people reqtime:0.49912949125272277 foobar:DcOYrONH
id:0424 time:[2013-11-19 02:34:58] level:WARN method:POST uri:/api/v1/textdata reqtime:2.930590441869852 foobar:XEZ5bQsh
(experimental)
Create a configuration file. Assume that a fluentd process is running on localhost:24224. A sample configuration is as follows:
# dummer.conf
configure 'sample' do
host "localhost" # define `host` and `port` instead of `output`
port 24224
rate 500
tag type: :string, any: %w[raw.syslog raw.message raw.nginx] # configure tag
field :id, type: :integer, countup: true, format: "%04d"
field :level, type: :string, any: %w[DEBUG INFO WARN ERROR]
field :method, type: :string, any: %w[GET POST PUT]
field :uri, type: :string, any: %w[/api/v1/people /api/v1/textdata /api/v1/messages]
field :reqtime, type: :float, range: 0.1..5.0
field :foobar, type: :string, length: 8
end
Running
$ dummer -c dummer.conf
Data is posted to fluentd process like (below is the fluentd log generated by out_stdout)
2014-01-31 00:55:32 +0900 raw.message: {"id":"1377","level":"INFO","method":"POST","uri":"/api/v1/people","reqtime":1.678867810409548,"foobar":"paOIWxhQ"}
2014-01-31 00:55:32 +0900 raw.syslog: {"id":"1378","level":"INFO","method":"GET","uri":"/api/v1/people","reqtime":4.8412816521873445,"foobar":"kUvnC0MK"}
2014-01-31 00:55:32 +0900 raw.message: {"id":"1379","level":"WARN","method":"GET","uri":"/api/v1/people","reqtime":3.584494903998221,"foobar":"KD78mpjX"}
You can specify some configuration parameters on CLI without writing them on a configuration file.
$ dummer help start
Usage:
dummer start
Options:
-c, [--config=CONFIG] # Config file
# Default: dummer.conf
-r, [--rate=N] # Number of generating messages per second
-o, [--output=OUTPUT] # Output file
-h, [--host=HOST] # Host of fluentd process
-p, [--port=N] # Port of fluentd process
-m, [--message=MESSAGE] # Output message
-d, [--daemonize] # Daemonize. Stop with `dummer stop`
-w, [--workers=N] # Number of parallels
[--worker-type=WORKER_TYPE]
# Default: process
-p, [--pid-path=PID_PATH]
# Default: dummer.pid
Following parameters in the configuration file are available:
-
output
Specify a filename to output, or IO object (STDOUT, STDERR)
-
host
Post a data to a fluentd process on the specified host. Either of
output
orhost
can be specified. -
port
Post a data to a fluentd process on the specified post. Default is 24224.
-
rate
Specify how many messages to generate per second. Default: 500 msgs / sec
-
workers
Specify number of processes for parallel processing.
-
delimiter
Specify the delimiter between each field. Default: "\t" (Tab)
-
labeled
Whether add field name as a label or not. Default: true
-
label_delimiter
Specify the delimiter between the label and the value. Default: ":" (column)
-
tag
Define tag field to generate. This is effective only for posting data to fluentd process with
host
andport
. -
field
Random field generator mode. Define data fields to generate.
message
andinput
options are ignored. See alsoField Data Types
section below. -
message
Specific message generation mode. See message.conf as an example. This mode works pretty fast because it does not require to generate values randomly.
-
input
Messages taken from an input file mode. Use this if you want to write messages by reading lines of an input file in rotation.
message
option is ignored. See input.conf as an example. This mode also works fast.
You can specify following data types to your tag
and field
parameters:
-
:datetime
-
:format
You can specify format of datetime as
%Y-%m-%d %H:%M:%S
. See Time#strftime for details. -
:random
Generate datetime randomly. Default: false (Time.now)
-
:value
You can specify a fixed Time object.
-
-
:string
-
:any
You can specify an array of strings, then the generator picks one from them randomly
-
:length
You can specify the length of string to generate randomly
-
:value
You can specify a fixed string
-
-
:integer
-
:format
You can specify a format of string as
%03d
. -
:range
You can specify a range of integers, then the generator picks one in the range (uniform) randomly
-
:countup
Generate countup data. Default: false
-
:value
You can specify a fixed integer
-
-
:float
-
:format
You can specify a format of string as
%03.1f
. -
:range
You can specify a range of float numbers, then the generator picks one in the range (uniform) randomly
-
:value
You can specify a fixed float number
-
I created a simple version of dummer
since dummer
could not achieve the maximum system I/O throughputs because of its rich features.
This simple version, dummer_simple
could achieve the system I/O limit in my environment.
Sorry, but this simple script cannot post data to fluentd process, supports only writing to a file.
$ dummer_simple [options]
Usage:
dummer_simple
Options:
[--sync] # Set `IO#sync=true`
-s, [--second=N] # Duration of running in second
# Default: 1
-p, [--parallel=N] # Number of processes to run in parallel
# Default: 1
-o, [--output=OUTPUT] # Output file
# Default: dummy.log
-i, [--input=INPUT] # Input file (Output messages by reading lines of the file in rotation)
-m, [--message=MESSAGE] # Output message
# Default: time:2013-11-20 23:39:42 +0900 level:ERROR method:POST uri:/api/v1/people reqtime:3.1983877060667103
I created a wrapped version of yes
command, dummer_yes
, to confrim that dummer_simple
achieves the maximum system I/O throughputs.
I do not use dummer_yes
command anymore because I verified that dummer_simple
achieves the I/O limit, but I will keep this command so that users can do verification experiments with it.
$ dummer_yes [options]
Usage:
dummer_yes
Options:
-s, [--second=N] # Duration of running in second
# Default: 1
-p, [--parallel=N] # Number of processes to run in parallel
# Default: 1
-o, [--output=OUTPUT] # Output file
# Default: dummy.log
-m, [--message=MESSAGE] # Output message
# Default: time:2013-11-20 23:39:42 +0900 level:ERROR method:POST uri:/api/v1/people reqtime:3.1983877060667103
There is a fluent-plugin-dummydata-producer, but I wanted to output dummy data to a log file, and I wanted a standalone separated tool to do benchmark.
- write tests
- Fork it
- Create your feature branch (
git checkout -b my-new-feature
) - Commit your changes (
git commit -am 'Add some feature'
) - Push to the branch (
git push origin my-new-feature
) - Create new Pull Request
See LICENSE.txt