logstash-plugins/logstash-integration-jdbc

setting jdbc_default_timezone in one plugin instance affects behavior of plugins where it is not set

yaauie opened this issue · 1 comments

Logstash information:

Please include the following information:

  1. Logstash version: 7.x -> 8.x
  2. Logstash installation source (e.g. built from source, with a package manager: DEB/RPM, expanded from tar or zip archive, docker): any
  3. How is Logstash being run (e.g. as a service/service manager: systemd, upstart, etc. Via command line, docker/kubernetes): any
  4. How was the Logstash Plugin installed: default

JVM (e.g. java -version): any

OS version (uname -a if on a Unix-like system): any

Description of the problem including expected versus actual behavior**:

  • GIVEN: a database table with times stored in a local timezone

  • GIVEN: a logstash instance in the same shared local timezone

  • OBSERVED: an instance of the JDBC input plugin configured without specifying jdbc_default_timezone will load timestamps correctly (that is: 2023-12-31T18:00:00.000 in America/Chicago becomes 2024-01-01T00:00:00.000Z)

  • HOWEVER: if another instance of the jdbc input plugin whose jdbc_default_timezone is set exists (or has ever existed) in the process:

  • OBSERVED: an instance of the JDBC input plugin configured without specifying jdbc_default_timezone will load timestamps incorrectly (that is: 2023-12-31T18:00:00.000 in America/Chicago becomes 2023-12-31T18:00:00.000Z)

Workaround

Always supply the jdbc_default_timezone when defining jdbc input plugins, optionally using TZ environment variable that is present on most systems and controls the Logstash process's timezone:

    jdbc_default_timezone => "${TZ}"

Steps to reproduce:

This is easiest to reproduce in postgresql, although the concept applies to any database vendor.

  1. download the jdbc driver jar, and export its absolute path to POSTGRESQL_JDBC_DRIVER_JAR:

    wget https://jdbc.postgresql.org/download/postgresql-42.7.1.jar
    export POSTGRESQL_JDBC_DRIVER_JAR="${PWD}/postgresql-42.7.1.jar"
    
  2. bootstrap a local postgresql server and load it with data. In this case we create a view containing rows that have equivalent timestamps from the perspective of different time zones:

    alter database postgres set timezone to 'America/Chicago';
    create table timezones (id varchar(32) primary key);
    insert into timezones values ('Etc/UTC'),('America/Chicago'),('Europe/Berlin');
    create view timestamps_annotated as (select id as timezone, utc_iso8601, utc_iso8601::timestamptz at time zone id as timestamp_value from (select id, '2024-01-01T00:00:00.000Z' as utc_iso8601 from timezones) as intermediate);
    

    and observe the values for timestamp_value, in the context of the respective timezone are equivalent to the origin utc_iso8601 value:

    postgres=# select * from timestamps_annotated;
        timezone     |       utc_iso8601        |   timestamp_value
    -----------------+--------------------------+---------------------
     Etc/UTC         | 2024-01-01T00:00:00.000Z | 2024-01-01 00:00:00
     America/Chicago | 2024-01-01T00:00:00.000Z | 2023-12-31 18:00:00
     Europe/Berlin   | 2024-01-01T00:00:00.000Z | 2024-01-01 01:00:00
    (3 rows)
    
  3. create readonly role with appropriate permissions to select from the table and a read_user who has the role:

    create role readaccess;
    grant connect on database postgres to readaccess;
    grant usage on schema public to readaccess;
    grant select on timestamps_annotated to readaccess;
    create user read_user with password 'p4$$w0rd';
    grant readaccess to read_user;
    
  4. run a pipeline with a single jdbc input that has no jdbc_default_timezone:

    input {
    	jdbc {
    		statement => "select * from timestamps_annotated WHERE timezone = 'America/Chicago'"
    		jdbc_connection_string => "jdbc:postgresql:postgres"
    		jdbc_user => "read_user"
    		jdbc_password => "p4$$w0rd"
    		jdbc_driver_class => "org.postgresql.Driver"
    		jdbc_driver_library => "${POSTGRESQL_JDBC_DRIVER_JAR}"
    		add_field => { "note" => "all-defaults" }
    	}
    }
    output { stdout { codec => rubydebug } }
    

    -- single-plugin-no-tz-directives.conf

    TZ="America/Chicago" ${LOGSTASH_HOME}/bin/logstash -f "${PWD}/single-plugin-no-tz-directives.conf"
    

    Observe that the timestamp_value is equivalent to the origin utc_iso8601 timestamp, which is the correct behaviour we expect:

    {
               "timezone" => "America/Chicago",
        "timestamp_value" => 2024-01-01T00:00:00.000Z,
             "@timestamp" => 2024-01-17T23:34:13.739663Z,
            "utc_iso8601" => "2024-01-01T00:00:00.000Z",
               "@version" => "1",
                   "note" => "all-defaults"
    }
    
  5. run a pipeline that has a mix of jdbc input plugins, some specifying jdbc_default_timezone and some not:

    input {
    	jdbc {
    		statement => "select * from timestamps_annotated WHERE timezone = 'America/Chicago'"
    		jdbc_connection_string => "jdbc:postgresql:postgres"
    		jdbc_user => "read_user"
    		jdbc_password => "p4$$w0rd"
    		jdbc_driver_class => "org.postgresql.Driver"
    		jdbc_driver_library => "${POSTGRESQL_JDBC_DRIVER_JAR}"
    		add_field => { "note" => "all-defaults" }
    	}
    	jdbc {
    		statement => "select * from timestamps_annotated WHERE timezone = 'America/Chicago'"
    		jdbc_connection_string => "jdbc:postgresql:postgres"
    		jdbc_user => "read_user"
    		jdbc_password => "p4$$w0rd"
    		jdbc_driver_class => "org.postgresql.Driver"
    		jdbc_driver_library => "${POSTGRESQL_JDBC_DRIVER_JAR}"
    		jdbc_default_timezone => "America/Chicago"
    		add_field => { "note" => "set jdbc_default_timezone=America/Chicago" }
    	}
    	jdbc {
    		statement => "select * from timestamps_annotated WHERE timezone = 'America/Chicago'"
    		jdbc_connection_string => "jdbc:postgresql:postgres"
    		jdbc_user => "read_user"
    		jdbc_password => "p4$$w0rd"
    		jdbc_driver_class => "org.postgresql.Driver"
    		jdbc_driver_library => "${POSTGRESQL_JDBC_DRIVER_JAR}"
    		jdbc_default_timezone => "${TZ}"
    		add_field => { "note" => "set jdbc_default_timezone=ENV[TZ]" }
    	}
    	jdbc {
    		statement => "select * from timestamps_annotated WHERE timezone = 'Etc/UTC'"
    		jdbc_connection_string => "jdbc:postgresql:postgres"
    		jdbc_user => "read_user"
    		jdbc_password => "p4$$w0rd"
    		jdbc_driver_class => "org.postgresql.Driver"
    		jdbc_driver_library => "${POSTGRESQL_JDBC_DRIVER_JAR}"
    		jdbc_default_timezone => "Etc/UTC"
    		add_field => { "note" => "set jdbc_default_timezone=Etc/UTC" }
    	}
    	jdbc {
    		statement => "select * from timestamps_annotated WHERE timezone = 'Europe/Berlin'"
    		jdbc_connection_string => "jdbc:postgresql:postgres"
    		jdbc_user => "read_user"
    		jdbc_password => "p4$$w0rd"
    		jdbc_driver_class => "org.postgresql.Driver"
    		jdbc_driver_library => "${POSTGRESQL_JDBC_DRIVER_JAR}"
    		jdbc_default_timezone => "Europe/Berlin"
    		add_field => { "note" => "set jdbc_default_timezone=Europe/Berlin" }
    	}
    }
    output { stdout { codec => rubydebug } }
    

    -- multiple-plugins-mixed.conf

     TZ="America/Chicago" ${LOGSTASH_HOME}/bin/logstash -f "${PWD}/multiple-plugins-mixed.conf"
    

    Observe that all plugins that have jdbc_default_timezone defined correctly map their timestamp_value, but the the all-defaults plugin instance whose definition has not changed no longer handles timestamp_value correctly, and instead handles the value as if it were in Etc/UTC:

    {
               "@version" => "1",
             "@timestamp" => 2024-01-17T23:43:24.743200Z,
        "timestamp_value" => 2024-01-01T00:00:00.000Z,
                   "note" => "set jdbc_default_timezone=Europe/Berlin",
               "timezone" => "Europe/Berlin",
            "utc_iso8601" => "2024-01-01T00:00:00.000Z"
    }
    {
               "@version" => "1",
             "@timestamp" => 2024-01-17T23:43:24.743198Z,
        "timestamp_value" => 2024-01-01T00:00:00.000Z,
                   "note" => "set jdbc_default_timezone=Etc/UTC",
               "timezone" => "Etc/UTC",
            "utc_iso8601" => "2024-01-01T00:00:00.000Z"
    }
    {
               "@version" => "1",
             "@timestamp" => 2024-01-17T23:43:24.743198Z,
        "timestamp_value" => 2023-12-31T18:00:00.000Z,
                   "note" => "all-defaults",
               "timezone" => "America/Chicago",
            "utc_iso8601" => "2024-01-01T00:00:00.000Z"
    }
    {
               "@version" => "1",
             "@timestamp" => 2024-01-17T23:43:24.743200Z,
        "timestamp_value" => 2024-01-01T00:00:00.000Z,
                   "note" => "set jdbc_default_timezone=America/Chicago",
               "timezone" => "America/Chicago",
            "utc_iso8601" => "2024-01-01T00:00:00.000Z"
    }
    {
               "@version" => "1",
             "@timestamp" => 2024-01-17T23:43:24.743204Z,
        "timestamp_value" => 2024-01-01T00:00:00.000Z,
                   "note" => "set jdbc_default_timezone=ENV[TZ]",
               "timezone" => "America/Chicago",
            "utc_iso8601" => "2024-01-01T00:00:00.000Z"
    }
    

Analysis

This issue shares a root cause with #53.

It boils down to a difference in the two different paths taken for time-like objects depending on the Sequel.datetime_class, which is modified when a single plugin contains a jdbc_default_timezone directive and activates Sequel's named timezones extension.

  • Time (default): uses Time.parse, which assumes local offset when none is given, and since the database is not configured with a timezone, no additional shifting is necessary
  • DateTime (when any instance has jdbc_default_timezone): uses DateTime.parse, which assumes UTC when no offset is given, and relies on the database instance's current timezone (as configured by the plugin's jdbc_default_timezone) to shift to the appropriate offset. When a plugin's database does not have a current timezone, the value remains in UTC.

Two leads:

  • Sequel v5.23.0:

    Sequel.datetime_class = Time is now supported when using the
    named_timezones extension.

  • Behaviour of jdbc_default_timezone => "${TZ}" appears identical to the behaviour when the named timezones extension is not enabled (need to validate in context of plugin_timezone option, which is also leaky because it modifies Sequel.default_timezone)