Error when creating tables time partitioned by `DATE` column with type `DAY`
mvribeiro75 opened this issue · 2 comments
Environment details
- OS type and version: ubuntu 20.04.6 LTS running on WSL2
- Python version: 3.8
- pip version: 24.0
sqlalchemy-bigquery
version: 1.10.0
Steps to reproduce
- Run
alembic init migrations
- Edit alembic.ini to include
sqlalchemy.url = bigquery://<gcp-project-id>/<dataset>
- Created
output_port.py
with the sqlalchemy model:
# output_port.py
from sqlalchemy import Table, Column, String, DATE
from google.cloud import bigquery
from sqlalchemy import MetaData
metadata_obj = MetaData()
my_model = Table(
'<my_table>',
metadata_obj,
Column("PERSON_SK", String, nullable=True),
Column("AVAILABLE_TIME", DATE, nullable=True),
bigquery_time_partitioning=bigquery.TimePartitioning(
field="AVAILABLE_TIME",
type_="DAY",
),
bigquery_require_partition_filter=True,
)
- Edit
env.py
to target the model:
...
from output_port import my_model
target_metadata = my_model.metadata
...
- Run alembic revision --autogenerate -m "create table"
The above command generated the file migrations/versions/aba04700771e_create_table.py
. Inside this file, we see:
def upgrade() -> None:
# ### commands auto generated by Alembic - please adjust! ###
op.create_table('<dataset>.<my_table>',
sa.Column('PERSON_SK', sa.String(), nullable=True),
sa.Column('AVAILABLE_TIME', sa.DATE(), nullable=True),
bigquery_require_partition_filter=True,
bigquery_time_partitioning=TimePartitioning(field='AVAILABLE_TIME',type_='DAY')
)
# ### end Alembic commands ###
- Run
alembic upgrade head
Error
qlalchemy.exc.DatabaseError: (google.cloud.bigquery.dbapi.exceptions.DatabaseError) 400 POST https://bigquery.googleapis.com/bigquery/v2/projects/cloudmlx-d-098213/queries?prettyPrint=false: PARTITION BY expression must be _PARTITIONDATE, DATE(_PARTITIONTIME), DATE(<timestamp_column>), DATE(<datetime_column>), DATETIME_TRUNC(<datetime_column>, DAY/HOUR/MONTH/YEAR), a DATE column, TIMESTAMP_TRUNC(<timestamp_column>, DAY/HOUR/MONTH/YEAR), DATE_TRUNC(<date_column>, MONTH/YEAR), or RANGE_BUCKET(<int64_column>, GENERATE_ARRAY(<int64_value>, <int64_value>[, <int64_value>]))
[SQL:
CREATE TABLE `<dataset>.<my_table>` (
`PERSON_SK` STRING,
`AVAILABLE_TIME` DATE
) PARTITION BY DATE_TRUNC(AVAILABLE_TIME, DAY)
OPTIONS(require_partition_filter=true)
]
(Background on this error at: https://sqlalche.me/e/14/4xp6)
Note: I changed my dataset name to <dataset>
and my table name to <my_table> .
Unfortunately the link above does not provide much information on this particular issue.
Possible explanation
From google documentation, since AVAILABLE_TIME
is of type DATE
and the partition is by DAY
, the SQL query should not contain DATE_TRUNC
. DATE_TRUNC
should be used with columns of type DATE
only when the partition is by MONTH
or YEAR
.
It seems that the correct SQL query should be:
CREATE TABLE `<dataset>.<my_table>` (
`PERSON_SK` STRING,
`AVAILABLE_TIME` DATE
) PARTITION BY AVAILABLE_TIME
OPTIONS(require_partition_filter=true)
Other tests
If I change the partition to MONTH
in the sqlalchemy model, everything runs fine and the table is successfully created in bigquery:
# output_port.py
from sqlalchemy import Table, Column, String, DATE
from google.cloud import bigquery
from sqlalchemy import MetaData
metadata_obj = MetaData()
my_model = Table(
'<my_table>',
metadata_obj,
Column("PERSON_SK", String, nullable=True),
Column("AVAILABLE_TIME", DATE, nullable=True),
bigquery_time_partitioning=bigquery.TimePartitioning(
field="AVAILABLE_TIME",
type_="MONTH",
),
bigquery_require_partition_filter=True,
)
Thanks for submitting this issue. NOTE:
There is another issue in the queue that appears to deal with this same concern.
That issue also has a PR in the works.
Gonna close this as a duplicate.
Oops yes same issue, missed it when I opened mine. Thanks @chalmerlowe