googleapis/python-bigquery

BUG: `bq_to_arrow_field` in `_pandas_helper.py` always sets `pyarrow.field` to nullable

Closed this issue · 0 comments

Current implementation of bq_to_arrow_field always sets pyarrow.field to nullable. However, since BigQuery array can not be set to nullable, the following error shows up when trying to upload a pandas dataframe to a bigquery table that contains a field in repeated mode.

Stack trace

Traceback (most recent call last):

  File "/project_path/field_trie.py", line 426, in <module>

    main()

  File "/project_path/field_trie.py", line 363, in main

    job = client.load_table_from_dataframe(df, table)

          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/../lib/python3.12/site-packages/google/cloud/bigquery/client.py", line 2793, in load_table_from_dataframe

    _pandas_helpers.dataframe_to_parquet(

  File "/../lib/python3.12/site-packages/google/cloud/bigquery/_pandas_helpers.py", line 669, in dataframe_to_parquet

    arrow_table = dataframe_to_arrow(dataframe, bq_schema)

                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/../lib/python3.12/site-packages/google/cloud/bigquery/_pandas_helpers.py", line 611, in dataframe_to_arrow

    bq_to_arrow_array(get_column_or_index(dataframe, bq_field.name), bq_field)

  File "/../lib/python3.12/site-packages/google/cloud/bigquery/_pandas_helpers.py", line 319, in bq_to_arrow_array

    return pyarrow.ListArray.from_pandas(series, type=arrow_type)

           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "pyarrow/array.pxi", line 1115, in pyarrow.lib.Array.from_pandas

  File "pyarrow/array.pxi", line 339, in pyarrow.lib.array

  File "pyarrow/array.pxi", line 85, in pyarrow.lib._ndarray_to_array

  File "pyarrow/error.pxi", line 91, in pyarrow.lib.check_status

pyarrow.lib.ArrowInvalid: Cannot append scalar of type struct<order_id: int64, order_date: date32[day], event_timestamps: list<item: timestamp[us, tz=UTC]> not null, items: list<item: struct<item_id: int64, item_name: string, item_price: double>> not null> to builder for type struct<order_id: int64, order_date: date32[day], event_timestamps: list<item: timestamp[us, tz=UTC]>, items: list<item: struct<item_id: int64, item_name: string, item_price: double>>>