apache/arrow-java

Flight SQL JDBC: Fix timezone/timestamp handling

Opened this issue · 4 comments

Describe the enhancement requested

I believe the JDBC driver is handling timezones/timestamps/java.sql.Timestamp entirely incorrectly.

  • Timestamps are round-tripped through LocalDateTime always. I believe this is wrong for timestamps with timezones: we should not round trip through a wall clock time since the same wall clock time may correspond to multiple different instants (during a DST transition)

  • From experimenting with the PostgreSQL JDBC driver, it appears that when requesting a java.sql.Timestamp, the expected behavior is:

    • Naive timestamp: return a Timestamp that appears to have the right wall clock time in the given Calendar, or the system time zone if no calendar is supplied. (In other words, use Timestamp as a janky LocalDateTime.)
    • Zoned timestamp: return a Timestamp whose value is the timestamp in UTC. (In other words, use Timestamp as an Instant.)

    It appears our driver does not do this properly.

We also need to update the tests introduced in #840

So one of the interesting things I found while exploring this is that Avatica supports a timezone property: https://calcite.apache.org/avatica/docs/client_reference.html#timeZone

We implicitly use it in some places due to the Avatica code base. For example, ResultSet.getObject calls the AvaticaSite.get which, for timestamps, passes in the localCalendar. I started PR #878 to pass this in to the Timestamp vector accessor, and this could be a good place to finish solving this once and for all.

One of the difficult things here is figuring out what exactly the desired Calendar is supposed to mean, especially taking into account both the timezone property for the driver and the default JVM option.

  • The description for timezone in Avatica isn't particularly helpful: The timezone that will be used for dates and times, but it does seem to imply that this setting is meant to just override the system default. My guess is this is supposed to mean the same as explicitly passing in a calendar into the method, and is the default when none is passed. This value is also sent back to the server, which the server could interpret in a variety of ways.
  • Intuitively, I would expect the calendar to mean: "return the values in this desired timezone", (convert the value into the desired calendar).
  • The JDBC spec for getTimestamp says the given calendar to construct an appropriate millisecond value for the timestamp if the underlying database does not store timezone information, which seems to imply it should only apply to timestamps without timezone. Additionally, it still doesn't specify if this means "convert timestamp value to the given calendar" or "assume the timestamp is in the given calendar"

When I tested the PostgreSQL JDBC driver, timestamptz values indeed ignored the calendar. BUT: it depended on if you requested a timestamp or Date. I need to dig up and fix up the test I was constructing to put it somewhere as the behavior I observed was rather confusing.

Hi, I just saw this issue with timezones. Maybe, it is related with this one?
GH-36518: [Java] Fix ArrowFlightJdbcTimeStampVectorAccessor to return Timestamp objects with date and time that corresponds with local time instead of UTC date and time

That issue was closed for adding changes with #464

I tested the driver after #464 changes and the behavior is still the same.

In #43149 there is also a comparison with Oracle and SQLServer drivers.