BUG: `bq_to_arrow_field` in `_pandas_helper.py` always sets `pyarrow.field` to nullable
Closed this issue · 0 comments
xyloid commented
Current implementation of bq_to_arrow_field
always sets pyarrow.field
to nullable. However, since BigQuery array can not be set to nullable, the following error shows up when trying to upload a pandas dataframe to a bigquery table that contains a field in repeated mode.
Stack trace
Traceback (most recent call last):
File "/project_path/field_trie.py", line 426, in <module>
main()
File "/project_path/field_trie.py", line 363, in main
job = client.load_table_from_dataframe(df, table)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/../lib/python3.12/site-packages/google/cloud/bigquery/client.py", line 2793, in load_table_from_dataframe
_pandas_helpers.dataframe_to_parquet(
File "/../lib/python3.12/site-packages/google/cloud/bigquery/_pandas_helpers.py", line 669, in dataframe_to_parquet
arrow_table = dataframe_to_arrow(dataframe, bq_schema)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/../lib/python3.12/site-packages/google/cloud/bigquery/_pandas_helpers.py", line 611, in dataframe_to_arrow
bq_to_arrow_array(get_column_or_index(dataframe, bq_field.name), bq_field)
File "/../lib/python3.12/site-packages/google/cloud/bigquery/_pandas_helpers.py", line 319, in bq_to_arrow_array
return pyarrow.ListArray.from_pandas(series, type=arrow_type)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "pyarrow/array.pxi", line 1115, in pyarrow.lib.Array.from_pandas
File "pyarrow/array.pxi", line 339, in pyarrow.lib.array
File "pyarrow/array.pxi", line 85, in pyarrow.lib._ndarray_to_array
File "pyarrow/error.pxi", line 91, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: Cannot append scalar of type struct<order_id: int64, order_date: date32[day], event_timestamps: list<item: timestamp[us, tz=UTC]> not null, items: list<item: struct<item_id: int64, item_name: string, item_price: double>> not null> to builder for type struct<order_id: int64, order_date: date32[day], event_timestamps: list<item: timestamp[us, tz=UTC]>, items: list<item: struct<item_id: int64, item_name: string, item_price: double>>>