lensesio/stream-reactor

Misleading Date Format Pattern for CassandraSourceConnector

jp-9 opened this issue · 0 comments

jp-9 commented

Issue Guidelines

Please review these questions before submitting any issue?

What version of the Stream Reactor are you reporting this issue for?

bbd3c5b

Are you running the correct version of Kafka/Confluent for the Stream reactor release?

Yes

Do you have a supported version of the data source/sink .i.e Cassandra 3.0.9?

Yes

Have you read the docs?

Yes

What is the expected behaviour?

The CassandraDateFormatter uses the date format pattern: "yyyy-MM-dd HH:mm:ss.SSS'Z'"

  class CassandraDateFormatter {
    private val dateFormatPattern = "yyyy-MM-dd HH:mm:ss.SSS'Z'"  // <----- Hardcoded pattern

    def parse(date: String): Date = {
      val dateFormatter = new SimpleDateFormat(dateFormatPattern)
      dateFormatter.parse(date)
    }

    def format(date: Date): String = {
      val dateFormatter = new SimpleDateFormat(dateFormatPattern)
      dateFormatter.format(date)
    }

    def getYear(date: Date): Option[Int] = {
      val dateFormatter = new SimpleDateFormat("yyyy");
      dateFormatter.format(date).toIntOption
    }
  }

When setting my initial offset I want to do it in UTC time so intuitively you would something like this:
connect.cassandra.initial.offset=2022-12-22 18:00:0.000Z <--- the Z at the end usually indicating that this is a UTC+00 date.

However the format pattern that is actually implemented is a bit misleading. The date set must end in a Z in order for it to be parsed correctly, but because the Z in the format pattern is in quotes it doesn't actually use it when determining timezone, it just requires it to be in the date string. If we want the Z at the end to indicate UTC time the format has to be "yyyy-MM-dd HH:mm:ss.SSSX" (https://docs.oracle.com/en/java/javase/12/docs/api/java.base/java/text/SimpleDateFormat.html)

An example to illustrate my point, assuming I am in UTC-05:00 (Eastern Standard Time).

According to ISO 8601 "2022-12-22 12:00:00.000Z" should be Thu Dec 22 7:00:00 EST 2022

>> new SimpleDateFormat("yyyy-MM-dd HH:mm:ss.SSS'Z'").parse("2022-12-22 12:00:00.000Z")
Output:
❌ Thu Dec 22 12:00:00 EST 2022

>> new SimpleDateFormat("yyyy-MM-dd HH:mm:ss.SSSX").parse("2022-12-22 12:00:00.000Z")
Output:
✔ Thu Dec 22 7:00:00 EST 2022

Was this design intentional? Is there a way to set the initial offset in Zulu Time?

** Edit: Accidentally included some of my test code in the CassandraDateFormatter copy