googleapis/python-bigquery

Support julian day timestamp value conversion to datetime.datetime

erenkeskus opened this issue · 4 comments

I'm using an image called ghcr.io/goccy/bigquery-emulator:0.6.1 as a test container in my unit tests.
However this image is using sqlite for database. I also use bigquery python client library 3.24.0.
The problem is the following:
sqlite is using floats to store timestamps (julian day form) whereas bigquery stores them as integers.
When converting the int into datetime in _timstamp_from_json function there is a usage of int() which throws a ValueError when a string of a float is provided.
The error:
ValueError: invalid literal for int() with base 10

My suggestion is to change the line in _helpers.py to int(float(value)).

Indeed, if you are interested in adding this, feel free to open a PR! Otherwise I'm happy to take care of it.

@Linchin @erenkeskus

Before we continue to review PR #1953 associated with this, I would like to better understand this issue. Can you help me?

I looked at Julian Dates and there are many formats. Depending on the expected format, the date that a float produces can vary widely (see this article: Julian Day).

  • What format are we expecting?
  • The function _datetime_from_microseconds() below is expecting a timedelta which it adds to the _EPOCH value... listed below. As shown on the wikipedia page above, epoch for each of the Julian Days varies widely, so what is the correlation between the Julian format we expect and Unix Epoch and do we need to account for that in this function?
  • By and large the datetime module expects dates based on the Gregorian calendar which differs from the Julian calendar by about 10 days (if memory serves). I don't know if this has an impact on what we get back if we pump in Julian dates.
  • What are some test cases we should be looking at to prove that the conversion happens as we expect it to? I would expect test cases that look at the specific format we expect and potentially and key dates for both the Julian calendar and the Gregorian calendar.
  • Depending on the complexity of the change and the potentially infrequent use of this functionality, is this a modification we need to add to the repository OR is this something that users should include in their code that calls our code?
_EPOCH = datetime.datetime(1970, 1, 1, tzinfo=datetime.timezone.utc)

...

def _datetime_from_microseconds(value):
    """Convert timestamp to datetime, assuming UTC.

    :type value: float
    :param value: The timestamp to convert

    :rtype: :class:`datetime.datetime`
    :returns: The datetime object created from the value.
    """
    return _EPOCH + datetime.timedelta(microseconds=value)

FYI: I will close this issue and associated PR tomorrow (Jun 28th) unless we have further dialogue.

Closing as will not fix.