treasure-data/embulk-filter-add_time

Is it possible to create a column with just the date?

robert-carroll opened this issue · 0 comments

I have a schema which contains the following columns:

in:
  ...
    columns:
    - {name: timestamp, type: timestamp, format: '%Y-%m-%d %H:%M:%S.%L'}
    - {name: timestamp_year, type: string}
    - {name: timestamp_month, type: string}
    - {name: timestamp_day, type: string, format: '%Y-%m-%d'}
out:
  ...
  default_timezone: {{ env.EMBULK_TZ }}
  column_options:
    timestamp: {value_type: timestamp, timestamp_format: '%Y-%m-%d %H:%M:%S.%L'}
    timestamp_year: {type: 'INT NULL'}
    timestamp_month: {type: 'VARCHAR(256) NULL'}
    timestamp_day: {type: 'VARCHAR(256) NULL'}

I use the default_timezone option to change the timezones on all timestamp fields on import. The problem is I cannot query on the other fields timestamp_day specifically or the other timestamp_partials, because the result will always be off by the timezone difference to UTC. Obviously there are ways around this with SQL, but it still means that those columns are off by TZ difference.

I'm trying to use this filter to create/replace the existing column with just the partials of the converted timestamp field, but run into a few issues.

filters:
- type: add_time
  to_column:
    name: timestamp_day
    type: timestamp
    timestamp_format: "%Y-%m-%d"
  from_column:
    name: timestamp

This creates timestamp_day_, which is a text field, not a date field, and doesn't contain the right TimeZone update. Is there anyway to make this work, or is there consideration for adding 'date' type with timezone conversions? Unless I'm missing something in the other plugins, this seems like the closest option for my purpose.