Wrong data types for empty parquet load into BigQuery
Closed this issue · 1 comments
lawrencestfs commented
When loading an empty parquet file into BigQuery (through the job API) using the STRING type for all fields, some fields gets parsed as INTEGER, resulting in the following error:
BigQuery error in load operation: Error processing job '[project]:[job]': Provided Schema does not match Table [project]:[dataset].[table]. Field [attribut] has
changed type from STRING to FLOAT
Relates to this stackoverflow issue.
Environment details
- Environment: Google Cloud Composer 1.17.9
- Airflow version: 2.1.4
Steps to reproduce
- Create an empty parquet file
- Create a BigQuery table with the same schema of the previous file, but with all fields as STRING
- Create a data pipeline using the
DataIngestionBase Composition
, to read the file and load it into the BigQuery table
lawrencestfs commented
After executing some tests through bq CLI, we observed the problem was the schema missing in the BigQuery load job.
@NiltonDuarte already issued a hot fix, and a final solution will be implemented on the same branch fix_bq_load_with_schema.
Basically, we'll provide the table schema to the load job - this will require a new field on the table definition yaml files called is_metadata
that will be used to ignore this metadata fields.