googleapis/python-bigquery-sqlalchemy

Error when creating tables time partitioned by `DATE` column with type `DAY`

mvribeiro75 opened this issue · 2 comments

Environment details

  • OS type and version: ubuntu 20.04.6 LTS running on WSL2
  • Python version: 3.8
  • pip version: 24.0
  • sqlalchemy-bigquery version: 1.10.0

Steps to reproduce

  1. Run alembic init migrations
  2. Edit alembic.ini to include sqlalchemy.url = bigquery://<gcp-project-id>/<dataset>
  3. Created output_port.py with the sqlalchemy model:
# output_port.py

from sqlalchemy import Table, Column, String, DATE
from google.cloud import bigquery

from sqlalchemy import MetaData
metadata_obj = MetaData()

my_model = Table(
    '<my_table>',
    metadata_obj,
    Column("PERSON_SK", String, nullable=True),
    Column("AVAILABLE_TIME", DATE, nullable=True),
    bigquery_time_partitioning=bigquery.TimePartitioning(
        field="AVAILABLE_TIME",
        type_="DAY",
    ),
    bigquery_require_partition_filter=True,
)
  1. Edit env.py to target the model:
...
from output_port import my_model
target_metadata = my_model.metadata
...
  1. Run alembic revision --autogenerate -m "create table"

The above command generated the file migrations/versions/aba04700771e_create_table.py. Inside this file, we see:

def upgrade() -> None:
    # ### commands auto generated by Alembic - please adjust! ###
    op.create_table('<dataset>.<my_table>',
    sa.Column('PERSON_SK', sa.String(), nullable=True),
    sa.Column('AVAILABLE_TIME', sa.DATE(), nullable=True),
    bigquery_require_partition_filter=True,
    bigquery_time_partitioning=TimePartitioning(field='AVAILABLE_TIME',type_='DAY')
    )
    # ### end Alembic commands ###
  1. Run alembic upgrade head

Error

qlalchemy.exc.DatabaseError: (google.cloud.bigquery.dbapi.exceptions.DatabaseError) 400 POST https://bigquery.googleapis.com/bigquery/v2/projects/cloudmlx-d-098213/queries?prettyPrint=false: PARTITION BY expression must be _PARTITIONDATE, DATE(_PARTITIONTIME), DATE(<timestamp_column>), DATE(<datetime_column>), DATETIME_TRUNC(<datetime_column>, DAY/HOUR/MONTH/YEAR), a DATE column, TIMESTAMP_TRUNC(<timestamp_column>, DAY/HOUR/MONTH/YEAR), DATE_TRUNC(<date_column>, MONTH/YEAR), or RANGE_BUCKET(<int64_column>, GENERATE_ARRAY(<int64_value>, <int64_value>[, <int64_value>]))
[SQL: 
CREATE TABLE `<dataset>.<my_table>` (
        `PERSON_SK` STRING, 
        `AVAILABLE_TIME` DATE
) PARTITION BY DATE_TRUNC(AVAILABLE_TIME, DAY)
OPTIONS(require_partition_filter=true)

]
(Background on this error at: https://sqlalche.me/e/14/4xp6)

Note: I changed my dataset name to <dataset> and my table name to <my_table> .

Unfortunately the link above does not provide much information on this particular issue.

Possible explanation

From google documentation, since AVAILABLE_TIME is of type DATE and the partition is by DAY, the SQL query should not contain DATE_TRUNC. DATE_TRUNC should be used with columns of type DATE only when the partition is by MONTH or YEAR.

It seems that the correct SQL query should be:

CREATE TABLE `<dataset>.<my_table>` (
        `PERSON_SK` STRING, 
        `AVAILABLE_TIME` DATE
) PARTITION BY AVAILABLE_TIME
OPTIONS(require_partition_filter=true)

Other tests

If I change the partition to MONTH in the sqlalchemy model, everything runs fine and the table is successfully created in bigquery:

# output_port.py

from sqlalchemy import Table, Column, String, DATE
from google.cloud import bigquery

from sqlalchemy import MetaData
metadata_obj = MetaData()

my_model = Table(
    '<my_table>',
    metadata_obj,
    Column("PERSON_SK", String, nullable=True),
    Column("AVAILABLE_TIME", DATE, nullable=True),
    bigquery_time_partitioning=bigquery.TimePartitioning(
        field="AVAILABLE_TIME",
        type_="MONTH",
    ),
    bigquery_require_partition_filter=True,
)

Thanks for submitting this issue. NOTE:

There is another issue in the queue that appears to deal with this same concern.
That issue also has a PR in the works.

Gonna close this as a duplicate.

Oops yes same issue, missed it when I opened mine. Thanks @chalmerlowe