setting jdbc_default_timezone in one plugin instance affects behavior of plugins where it is not set
yaauie opened this issue · 1 comments
Logstash information:
Please include the following information:
- Logstash version: 7.x -> 8.x
- Logstash installation source (e.g. built from source, with a package manager: DEB/RPM, expanded from tar or zip archive, docker): any
- How is Logstash being run (e.g. as a service/service manager: systemd, upstart, etc. Via command line, docker/kubernetes): any
- How was the Logstash Plugin installed: default
JVM (e.g. java -version
): any
OS version (uname -a
if on a Unix-like system): any
Description of the problem including expected versus actual behavior**:
-
GIVEN: a database table with times stored in a local timezone
-
GIVEN: a logstash instance in the same shared local timezone
-
OBSERVED: an instance of the JDBC input plugin configured without specifying
jdbc_default_timezone
will load timestamps correctly (that is:2023-12-31T18:00:00.000
inAmerica/Chicago
becomes2024-01-01T00:00:00.000Z
) -
HOWEVER: if another instance of the jdbc input plugin whose
jdbc_default_timezone
is set exists (or has ever existed) in the process: -
OBSERVED: an instance of the JDBC input plugin configured without specifying
jdbc_default_timezone
will load timestamps incorrectly (that is:2023-12-31T18:00:00.000
inAmerica/Chicago
becomes2023-12-31T18:00:00.000Z
)
Workaround
Always supply the jdbc_default_timezone
when defining jdbc input plugins, optionally using TZ
environment variable that is present on most systems and controls the Logstash process's timezone:
jdbc_default_timezone => "${TZ}"
Steps to reproduce:
This is easiest to reproduce in postgresql, although the concept applies to any database vendor.
-
download the jdbc driver jar, and export its absolute path to
POSTGRESQL_JDBC_DRIVER_JAR
:wget https://jdbc.postgresql.org/download/postgresql-42.7.1.jar export POSTGRESQL_JDBC_DRIVER_JAR="${PWD}/postgresql-42.7.1.jar"
-
bootstrap a local postgresql server and load it with data. In this case we create a view containing rows that have equivalent timestamps from the perspective of different time zones:
alter database postgres set timezone to 'America/Chicago'; create table timezones (id varchar(32) primary key); insert into timezones values ('Etc/UTC'),('America/Chicago'),('Europe/Berlin'); create view timestamps_annotated as (select id as timezone, utc_iso8601, utc_iso8601::timestamptz at time zone id as timestamp_value from (select id, '2024-01-01T00:00:00.000Z' as utc_iso8601 from timezones) as intermediate);
and observe the values for
timestamp_value
, in the context of the respectivetimezone
are equivalent to the originutc_iso8601
value:postgres=# select * from timestamps_annotated; timezone | utc_iso8601 | timestamp_value -----------------+--------------------------+--------------------- Etc/UTC | 2024-01-01T00:00:00.000Z | 2024-01-01 00:00:00 America/Chicago | 2024-01-01T00:00:00.000Z | 2023-12-31 18:00:00 Europe/Berlin | 2024-01-01T00:00:00.000Z | 2024-01-01 01:00:00 (3 rows)
-
create
readonly
role with appropriate permissions to select from the table and aread_user
who has the role:create role readaccess; grant connect on database postgres to readaccess; grant usage on schema public to readaccess; grant select on timestamps_annotated to readaccess; create user read_user with password 'p4$$w0rd'; grant readaccess to read_user;
-
run a pipeline with a single jdbc input that has no
jdbc_default_timezone
:input { jdbc { statement => "select * from timestamps_annotated WHERE timezone = 'America/Chicago'" jdbc_connection_string => "jdbc:postgresql:postgres" jdbc_user => "read_user" jdbc_password => "p4$$w0rd" jdbc_driver_class => "org.postgresql.Driver" jdbc_driver_library => "${POSTGRESQL_JDBC_DRIVER_JAR}" add_field => { "note" => "all-defaults" } } } output { stdout { codec => rubydebug } }
--
single-plugin-no-tz-directives.conf
TZ="America/Chicago" ${LOGSTASH_HOME}/bin/logstash -f "${PWD}/single-plugin-no-tz-directives.conf"
Observe that the
timestamp_value
is equivalent to the originutc_iso8601
timestamp, which is the correct behaviour we expect:{ "timezone" => "America/Chicago", "timestamp_value" => 2024-01-01T00:00:00.000Z, "@timestamp" => 2024-01-17T23:34:13.739663Z, "utc_iso8601" => "2024-01-01T00:00:00.000Z", "@version" => "1", "note" => "all-defaults" }
-
run a pipeline that has a mix of jdbc input plugins, some specifying
jdbc_default_timezone
and some not:input { jdbc { statement => "select * from timestamps_annotated WHERE timezone = 'America/Chicago'" jdbc_connection_string => "jdbc:postgresql:postgres" jdbc_user => "read_user" jdbc_password => "p4$$w0rd" jdbc_driver_class => "org.postgresql.Driver" jdbc_driver_library => "${POSTGRESQL_JDBC_DRIVER_JAR}" add_field => { "note" => "all-defaults" } } jdbc { statement => "select * from timestamps_annotated WHERE timezone = 'America/Chicago'" jdbc_connection_string => "jdbc:postgresql:postgres" jdbc_user => "read_user" jdbc_password => "p4$$w0rd" jdbc_driver_class => "org.postgresql.Driver" jdbc_driver_library => "${POSTGRESQL_JDBC_DRIVER_JAR}" jdbc_default_timezone => "America/Chicago" add_field => { "note" => "set jdbc_default_timezone=America/Chicago" } } jdbc { statement => "select * from timestamps_annotated WHERE timezone = 'America/Chicago'" jdbc_connection_string => "jdbc:postgresql:postgres" jdbc_user => "read_user" jdbc_password => "p4$$w0rd" jdbc_driver_class => "org.postgresql.Driver" jdbc_driver_library => "${POSTGRESQL_JDBC_DRIVER_JAR}" jdbc_default_timezone => "${TZ}" add_field => { "note" => "set jdbc_default_timezone=ENV[TZ]" } } jdbc { statement => "select * from timestamps_annotated WHERE timezone = 'Etc/UTC'" jdbc_connection_string => "jdbc:postgresql:postgres" jdbc_user => "read_user" jdbc_password => "p4$$w0rd" jdbc_driver_class => "org.postgresql.Driver" jdbc_driver_library => "${POSTGRESQL_JDBC_DRIVER_JAR}" jdbc_default_timezone => "Etc/UTC" add_field => { "note" => "set jdbc_default_timezone=Etc/UTC" } } jdbc { statement => "select * from timestamps_annotated WHERE timezone = 'Europe/Berlin'" jdbc_connection_string => "jdbc:postgresql:postgres" jdbc_user => "read_user" jdbc_password => "p4$$w0rd" jdbc_driver_class => "org.postgresql.Driver" jdbc_driver_library => "${POSTGRESQL_JDBC_DRIVER_JAR}" jdbc_default_timezone => "Europe/Berlin" add_field => { "note" => "set jdbc_default_timezone=Europe/Berlin" } } } output { stdout { codec => rubydebug } }
--
multiple-plugins-mixed.conf
TZ="America/Chicago" ${LOGSTASH_HOME}/bin/logstash -f "${PWD}/multiple-plugins-mixed.conf"
Observe that all plugins that have
jdbc_default_timezone
defined correctly map theirtimestamp_value
, but the theall-defaults
plugin instance whose definition has not changed no longer handlestimestamp_value
correctly, and instead handles the value as if it were inEtc/UTC
:{ "@version" => "1", "@timestamp" => 2024-01-17T23:43:24.743200Z, "timestamp_value" => 2024-01-01T00:00:00.000Z, "note" => "set jdbc_default_timezone=Europe/Berlin", "timezone" => "Europe/Berlin", "utc_iso8601" => "2024-01-01T00:00:00.000Z" } { "@version" => "1", "@timestamp" => 2024-01-17T23:43:24.743198Z, "timestamp_value" => 2024-01-01T00:00:00.000Z, "note" => "set jdbc_default_timezone=Etc/UTC", "timezone" => "Etc/UTC", "utc_iso8601" => "2024-01-01T00:00:00.000Z" } { "@version" => "1", "@timestamp" => 2024-01-17T23:43:24.743198Z, "timestamp_value" => 2023-12-31T18:00:00.000Z, "note" => "all-defaults", "timezone" => "America/Chicago", "utc_iso8601" => "2024-01-01T00:00:00.000Z" } { "@version" => "1", "@timestamp" => 2024-01-17T23:43:24.743200Z, "timestamp_value" => 2024-01-01T00:00:00.000Z, "note" => "set jdbc_default_timezone=America/Chicago", "timezone" => "America/Chicago", "utc_iso8601" => "2024-01-01T00:00:00.000Z" } { "@version" => "1", "@timestamp" => 2024-01-17T23:43:24.743204Z, "timestamp_value" => 2024-01-01T00:00:00.000Z, "note" => "set jdbc_default_timezone=ENV[TZ]", "timezone" => "America/Chicago", "utc_iso8601" => "2024-01-01T00:00:00.000Z" }
Analysis
This issue shares a root cause with #53.
It boils down to a difference in the two different paths taken for time-like objects depending on the Sequel.datetime_class
, which is modified when a single plugin contains a jdbc_default_timezone
directive and activates Sequel's named timezones extension.
Time
(default): usesTime.parse
, which assumes local offset when none is given, and since the database is not configured with a timezone, no additional shifting is necessaryDateTime
(when any instance hasjdbc_default_timezone
): usesDateTime.parse
, which assumes UTC when no offset is given, and relies on the database instance's current timezone (as configured by the plugin'sjdbc_default_timezone
) to shift to the appropriate offset. When a plugin's database does not have a current timezone, the value remains in UTC.
Two leads:
- Sequel v5.23.0:
Sequel.datetime_class = Time is now supported when using the
named_timezones extension. - Behaviour of
jdbc_default_timezone => "${TZ}"
appears identical to the behaviour when the named timezones extension is not enabled (need to validate in context ofplugin_timezone
option, which is also leaky because it modifiesSequel.default_timezone
)