sensu-plugins/sensu-plugins-postgres

Replication check defaults to boolean instead of integer

VeselaHouba opened this issue · 10 comments

When running the check like this:

check-postgres-replication.rb -m master -s slave -d db_name -u user -p pass

I get following output

outputCheck failed to run: invalid integer value "" for connection option "connect_timeout"
, ["/opt/sensu/embedded/lib/ruby/gems/2.4.0/gems/pg-0.18.3/lib/pg.rb:45:in `initialize'", "/opt/sensu/embedded/lib/ruby/gems/2.4.0/gems/pg-0.18.3/lib/pg.rb:45:in `new'", "/opt/sensu/embedded/lib/ruby/gems/2.4.0/gems/pg-0.18.3/lib/pg.rb:45:in `connect'", "/opt/sensu/embedded/lib/ruby/gems/2.4.0/gems/sensu-plugins-postgres-2.4.0/bin/check-postgres-replication.rb:108:in `run'", "/opt/sensu/embedded/lib/ruby/gems/2.4.0/gems/sensu-plugin-1.4.7/lib/sensu-plugin/cli.rb:58:in `block in '"]

When I manually specify timeout, it still has issues as it is casted to boolean instead of integer

check-postgres-replication.rb -m master -s slave -d db_name -u user -p pass -t 10

Check failed to run: invalid integer value "true" for connection option "connect_timeout"
, ["/opt/sensu/embedded/lib/ruby/gems/2.4.0/gems/pg-0.18.3/lib/pg.rb:45:in `initialize'", "/opt/sensu/embedded/lib/ruby/gems/2.4.0/gems/pg-0.18.3/lib/pg.rb:45:in `new'", "/opt/sensu/embedded/lib/ruby/gems/2.4.0/gems/pg-0.18.3/lib/pg.rb:45:in `connect'", "./lib/ruby/gems/2.4.0/gems/sensu-plugins-postgres-2.4.0/bin/check-postgres-replication.rb:108:in `run'", "/opt/sensu/embedded/lib/ruby/gems/2.4.0/gems/sensu-plugin-1.4.7/lib/sensu-plugin/cli.rb:58:in `block in <class:CLI>'"]

When I edit the file and put some integer value into default

check-postgres-replication.rb
97         default: nil,
-> 
97         default: 10,

Then things start working

CheckPostgresReplicationStatus OK: replication delayed by 0.0MB :: master:1/53007AE8 slave:1/53007AE8 m_segbytes:16777216

Using following ruby on debian10

/opt/sensu/embedded/bin/ruby --version
ruby 2.4.5p335 (2018-10-18 revision 65137) [x86_64-linux]

We can try to convert the nil object to an integer for a sensible default

2.4.1 :001 > nil
=> nil
2.4.1 :002 > nil.to_i
=> 0

or pass a sane default. Maybe 2 (IIRC 2s is the minimum connect_timeout value for libpq).

@VeselaHouba @majormoses WDYT?

So looking at: https://github.com/sensu-plugins/sensu-plugins-postgres/blob/3.0.0/bin/check-postgres-replication.rb#L94-L98 I might suggest something along the lines of:

option(:timeout,
         short: '-T',
         long: '--timeout',
         default: nil,
         description: 'Connection timeout (seconds)',
         proc: proc(&:to_i))

This will always ensure a boolean regardless of what is passed in. If the upstream PG library only supports values of 2 > then I think we should add a check and omit the param if < 2.
https://github.com/sensu-plugins/sensu-plugins-postgres/blob/3.0.0/bin/check-postgres-replication.rb#L108-L114 unless it auto gracefully handles that scenerio:

if config[:timeout] < 2
  conn_master = PG.connect(host: config[:master_host],
                             dbname: config[:database],
                             user: config[:user],
                             password: config[:password],
                             port: config[:port],
                             sslmode: ssl_mode)
else
  conn_master = PG.connect(host: config[:master_host],
                             dbname: config[:database],
                             user: config[:user],
                             password: config[:password],
                             port: config[:port],
                             sslmode: ssl_mode,
                             connect_timeout: config[:timeout])
end

If the postgres library will gracefully handle < 2 then the above is not needed and simply casting it to an an integer is the right move. I don't know if setting it to a 2 second default might be desired or not.

P.S. Ignore the weird tabbing Github has some incredibly odd way they deal with indented whitespace/tabs 🤷‍♂️

We can try to convert the nil object to an integer for a sensible default

2.4.1 :001 > nil
=> nil
2.4.1 :002 > nil.to_i
=> 0

or pass a sane default. Maybe 2 (IIRC 2s is the minimum connect_timeout value for libpq).

@VeselaHouba @majormoses WDYT?

Something like this works for me for both defined and undefined, so library can handle 0 as timeout.

   option(:timeout,
          short: '-T',
          long: '--timeout=VALUE',
          default: nil.to_i,
          description: 'Connection timeout (seconds)')

Note the added =VALUE - this is the reason why it kept switching to boolean.

default: nil.to_i,

No need to .to_i on the default, its best to do it for anything passed in via proc.

Here is the lipbq implementation of connect timeout

    /*
     * Set up a time limit, if connect_timeout isn't zero.
     */
    if (conn->connect_timeout != NULL)
    {
        if (!parse_int_param(conn->connect_timeout, &timeout, conn,
                             "connect_timeout"))
        {
            /* mark the connection as bad to report the parsing failure */
            conn->status = CONNECTION_BAD;
            return 0;
        }

        if (timeout > 0)
        {
            /*
             * Rounding could cause connection to fail unexpectedly quickly;
             * to prevent possibly waiting hardly-at-all, insist on at least
             * two seconds.
             */
            if (timeout < 2)
                timeout = 2;
        }
        else                    /* negative means 0 */
            timeout = 0;
    }

I suspect we can just cast as an int. @majormoses ?

@phumpal I think this seems like the right solution to me, can one of you try out that patch in 120 and see if that resolves it (I am pretty sure it will).

@majormoses thanks for putting this together.

Here's the result of #120

check-postgres-replication.rb -m 192.168.100.12 --slave-host=192.168.100.13 -P 6543 -d pg_test_dev -u foo -p bar -w 900 -c 1800 -t 5
CheckPostgresReplicationStatus OK: replication delayed by 0.0078582763671875MB :: master:60B9/FC090000 slave:60B9/FC08DFD0 m_segbytes:16777216

FWIW I am having trouble reproducing the original issue on ruby-2.4.1. I will launch a test box to test on ruby.2.4.0 which was used in the OP.

Before 120

ruby -v && check-postgres-replication.rb -m 192.168.100.12 --slave-host=192.168.100.13 -P 6543 -d pg_test_dev -u foo -p bar -w 900 -c 1800
ruby 2.4.0p0 (2016-12-24 revision 57164) [x86_64-linux]
Check failed to run: invalid integer value "" for connection option "connect_timeout"
, ["/opt/sensu/.rvm/gems/ruby-2.4.0/gems/pg-1.1.0/lib/pg.rb:56:in `initialize'", "/opt/sensu/.rvm/gems/ruby-2.4.0/gems/pg-1.1.0/lib/pg.rb:56:in `new'", "/opt/sensu/.rvm/gems/ruby-2.4.0/gems/pg-1.1.0/lib/pg.rb:56:in `connect'", "/opt/sensu/.rvm/gems/ruby-2.4.0/gems/sensu-plugins-postgres-3.0.0/bin/check-postgres-replication.rb:108:in `run'", "/opt/sensu/.rvm/gems/ruby-2.4.0/gems/sensu-plugin-1.4.7/lib/sensu-plugin/cli.rb:58:in `block in <class:CLI>'"]

After 120

ruby -v && check-postgres-replication.rb -m 192.168.100.12 --slave-host=192.168.100.13 -P 6543 -d pg_test_dev -u foo -p bar -w 900 -c 1800
Check failed to run: invalid integer value "" for connection option "connect_timeout"
, ["/opt/sensu/.rvm/gems/ruby-2.4.0/gems/pg-1.1.0/lib/pg.rb:56:in `initialize'", "/opt/sensu/.rvm/gems/ruby-2.4.0/gems/pg-1.1.0/lib/pg.rb:56:in `new'", "/opt/sensu/.rvm/gems/ruby-2.4.0/gems/pg-1.1.0/lib/pg.rb:56:in `connect'", "/opt/sensu/.rvm/gems/ruby-2.4.0/gems/sensu-plugins-postgres-3.0.0/bin/check-postgres-replication.rb:115:in `run'", "/opt/sensu/.rvm/gems/ruby-2.4.0/gems/sensu-plugin-1.4.7/lib/sensu-plugin/cli.rb:58:in `block in <class:CLI>'"]

@majormoses I suspect we may need a non-nil default value.

         default: 0,
         description: 'Connection timeout (seconds)',
         proc: proc(&:to_i))

@phumpal I decided to set the default to 2 per the code you linked.