Replication check defaults to boolean instead of integer
VeselaHouba opened this issue · 10 comments
When running the check like this:
check-postgres-replication.rb -m master -s slave -d db_name -u user -p pass
I get following output
outputCheck failed to run: invalid integer value "" for connection option "connect_timeout"
, ["/opt/sensu/embedded/lib/ruby/gems/2.4.0/gems/pg-0.18.3/lib/pg.rb:45:in `initialize'", "/opt/sensu/embedded/lib/ruby/gems/2.4.0/gems/pg-0.18.3/lib/pg.rb:45:in `new'", "/opt/sensu/embedded/lib/ruby/gems/2.4.0/gems/pg-0.18.3/lib/pg.rb:45:in `connect'", "/opt/sensu/embedded/lib/ruby/gems/2.4.0/gems/sensu-plugins-postgres-2.4.0/bin/check-postgres-replication.rb:108:in `run'", "/opt/sensu/embedded/lib/ruby/gems/2.4.0/gems/sensu-plugin-1.4.7/lib/sensu-plugin/cli.rb:58:in `block in '"]
When I manually specify timeout, it still has issues as it is casted to boolean instead of integer
check-postgres-replication.rb -m master -s slave -d db_name -u user -p pass -t 10
Check failed to run: invalid integer value "true" for connection option "connect_timeout"
, ["/opt/sensu/embedded/lib/ruby/gems/2.4.0/gems/pg-0.18.3/lib/pg.rb:45:in `initialize'", "/opt/sensu/embedded/lib/ruby/gems/2.4.0/gems/pg-0.18.3/lib/pg.rb:45:in `new'", "/opt/sensu/embedded/lib/ruby/gems/2.4.0/gems/pg-0.18.3/lib/pg.rb:45:in `connect'", "./lib/ruby/gems/2.4.0/gems/sensu-plugins-postgres-2.4.0/bin/check-postgres-replication.rb:108:in `run'", "/opt/sensu/embedded/lib/ruby/gems/2.4.0/gems/sensu-plugin-1.4.7/lib/sensu-plugin/cli.rb:58:in `block in <class:CLI>'"]
When I edit the file and put some integer value into default
check-postgres-replication.rb
97 default: nil,
->
97 default: 10,
Then things start working
CheckPostgresReplicationStatus OK: replication delayed by 0.0MB :: master:1/53007AE8 slave:1/53007AE8 m_segbytes:16777216
Using following ruby on debian10
/opt/sensu/embedded/bin/ruby --version
ruby 2.4.5p335 (2018-10-18 revision 65137) [x86_64-linux]
We can try to convert the nil object to an integer for a sensible default
2.4.1 :001 > nil
=> nil
2.4.1 :002 > nil.to_i
=> 0
or pass a sane default. Maybe 2
(IIRC 2s is the minimum connect_timeout
value for libpq).
@VeselaHouba @majormoses WDYT?
So looking at: https://github.com/sensu-plugins/sensu-plugins-postgres/blob/3.0.0/bin/check-postgres-replication.rb#L94-L98 I might suggest something along the lines of:
option(:timeout,
short: '-T',
long: '--timeout',
default: nil,
description: 'Connection timeout (seconds)',
proc: proc(&:to_i))
This will always ensure a boolean regardless of what is passed in. If the upstream PG library only supports values of 2 >
then I think we should add a check and omit the param if < 2
.
https://github.com/sensu-plugins/sensu-plugins-postgres/blob/3.0.0/bin/check-postgres-replication.rb#L108-L114 unless it auto gracefully handles that scenerio:
if config[:timeout] < 2
conn_master = PG.connect(host: config[:master_host],
dbname: config[:database],
user: config[:user],
password: config[:password],
port: config[:port],
sslmode: ssl_mode)
else
conn_master = PG.connect(host: config[:master_host],
dbname: config[:database],
user: config[:user],
password: config[:password],
port: config[:port],
sslmode: ssl_mode,
connect_timeout: config[:timeout])
end
If the postgres library will gracefully handle < 2
then the above is not needed and simply casting it to an an integer is the right move. I don't know if setting it to a 2 second default might be desired or not.
P.S. Ignore the weird tabbing Github has some incredibly odd way they deal with indented whitespace/tabs 🤷♂️
We can try to convert the nil object to an integer for a sensible default
2.4.1 :001 > nil
=> nil
2.4.1 :002 > nil.to_i
=> 0or pass a sane default. Maybe
2
(IIRC 2s is the minimumconnect_timeout
value for libpq).@VeselaHouba @majormoses WDYT?
Something like this works for me for both defined and undefined, so library can handle 0 as timeout.
option(:timeout,
short: '-T',
long: '--timeout=VALUE',
default: nil.to_i,
description: 'Connection timeout (seconds)')
Note the added =VALUE - this is the reason why it kept switching to boolean.
default: nil.to_i,
No need to .to_i
on the default, its best to do it for anything passed in via proc.
Here is the lipbq implementation of connect timeout
/*
* Set up a time limit, if connect_timeout isn't zero.
*/
if (conn->connect_timeout != NULL)
{
if (!parse_int_param(conn->connect_timeout, &timeout, conn,
"connect_timeout"))
{
/* mark the connection as bad to report the parsing failure */
conn->status = CONNECTION_BAD;
return 0;
}
if (timeout > 0)
{
/*
* Rounding could cause connection to fail unexpectedly quickly;
* to prevent possibly waiting hardly-at-all, insist on at least
* two seconds.
*/
if (timeout < 2)
timeout = 2;
}
else /* negative means 0 */
timeout = 0;
}
I suspect we can just cast as an int. @majormoses ?
@phumpal I think this seems like the right solution to me, can one of you try out that patch in 120 and see if that resolves it (I am pretty sure it will).
@majormoses thanks for putting this together.
Here's the result of #120
check-postgres-replication.rb -m 192.168.100.12 --slave-host=192.168.100.13 -P 6543 -d pg_test_dev -u foo -p bar -w 900 -c 1800 -t 5
CheckPostgresReplicationStatus OK: replication delayed by 0.0078582763671875MB :: master:60B9/FC090000 slave:60B9/FC08DFD0 m_segbytes:16777216
FWIW I am having trouble reproducing the original issue on ruby-2.4.1. I will launch a test box to test on ruby.2.4.0 which was used in the OP.
Before 120
ruby -v && check-postgres-replication.rb -m 192.168.100.12 --slave-host=192.168.100.13 -P 6543 -d pg_test_dev -u foo -p bar -w 900 -c 1800
ruby 2.4.0p0 (2016-12-24 revision 57164) [x86_64-linux]
Check failed to run: invalid integer value "" for connection option "connect_timeout"
, ["/opt/sensu/.rvm/gems/ruby-2.4.0/gems/pg-1.1.0/lib/pg.rb:56:in `initialize'", "/opt/sensu/.rvm/gems/ruby-2.4.0/gems/pg-1.1.0/lib/pg.rb:56:in `new'", "/opt/sensu/.rvm/gems/ruby-2.4.0/gems/pg-1.1.0/lib/pg.rb:56:in `connect'", "/opt/sensu/.rvm/gems/ruby-2.4.0/gems/sensu-plugins-postgres-3.0.0/bin/check-postgres-replication.rb:108:in `run'", "/opt/sensu/.rvm/gems/ruby-2.4.0/gems/sensu-plugin-1.4.7/lib/sensu-plugin/cli.rb:58:in `block in <class:CLI>'"]
After 120
ruby -v && check-postgres-replication.rb -m 192.168.100.12 --slave-host=192.168.100.13 -P 6543 -d pg_test_dev -u foo -p bar -w 900 -c 1800
Check failed to run: invalid integer value "" for connection option "connect_timeout"
, ["/opt/sensu/.rvm/gems/ruby-2.4.0/gems/pg-1.1.0/lib/pg.rb:56:in `initialize'", "/opt/sensu/.rvm/gems/ruby-2.4.0/gems/pg-1.1.0/lib/pg.rb:56:in `new'", "/opt/sensu/.rvm/gems/ruby-2.4.0/gems/pg-1.1.0/lib/pg.rb:56:in `connect'", "/opt/sensu/.rvm/gems/ruby-2.4.0/gems/sensu-plugins-postgres-3.0.0/bin/check-postgres-replication.rb:115:in `run'", "/opt/sensu/.rvm/gems/ruby-2.4.0/gems/sensu-plugin-1.4.7/lib/sensu-plugin/cli.rb:58:in `block in <class:CLI>'"]
@majormoses I suspect we may need a non-nil default value.
default: 0,
description: 'Connection timeout (seconds)',
proc: proc(&:to_i))
@phumpal I decided to set the default to 2
per the code you linked.