bxparks/bigquery-schema-generator

Support for "UTC" suffix in TIMESTAMP data

rsmorris123 opened this issue · 1 comments

Using bq extract to export table data from BigQuery exports default UTC timestamps in the format "YY-MM-DD HH:MI:SS UTC". This is the same format as displayed in BigQuery Web UI when previewing data.
When this data is passed through the schema generator, the regex on the TIMESTAMP_MATCHER fails and the data is interpreted as a STRING in the JSON schema.
Attempting to use bq update using the JSON schema on the same table the data was exported from then fails due to the change in data type from TIMESTAMP to STRING.
Should be quite simple to fix - need to add optional " UTC" check in regex as an alternative to "Z".

Thanks for the report!

The weird thing is that the "UTC" suffix isn't even documented by Google, which lists only the following suffixes:

(+|-)H[H][:M[M]]
Z

and as far as I understand, is not part of ISO 8601 which uses "Z" instead of "UTC". However, various examples in the Google documentation refers to this "UTC" suffix, I verified that bq extract outputs TIMESTAMP fields using the "UTC" suffix instead of "Z", and I verified that bq load also supports this "UTC" suffix. Therefore, I have added support for this suffix.

The fix will be on the develop branch, until I create a new release.