load_table_toDataframe breaks with Arrow list fields when the list is backed by a ChunkedArray.
cvm-a opened this issue · 3 comments
Thanks for stopping by to let us know something could be better!
PLEASE READ: If you have a support contract with Google, please create an issue in the support console instead of filing on GitHub. This will ensure a timely response.
Please run down the following list and make sure you've tried the usual "quick fixes":
- Search the issues already opened: https://github.com/googleapis/python-bigquery/issues
- Search StackOverflow: https://stackoverflow.com/questions/tagged/google-cloud-platform+python
If you are still having issues, please be sure to include as much information as possible:
Environment details
- OS type and version: MacOS Darwin Kernel Version 22.1.0
- Python version: 3.11
- pip version: 23.2.1
google-cloud-bigquery
version: 3.15.0
Steps to reproduce
- Create an Arrow backed dataframe with a large list field.
- Create a google.cloud.bigquery Client
- call Client.load_table_from_dataframe on this dataframe
Code example
# example
import pandas as pd
import pyarrow as pa
from google.cloud import bigquery as gbq
client= gbq.Client(
project=<project_id>,
credentials=<credentials>,
location=<location>,
)
df = pd.DataFrame({"x":pa.array(pd.Series([[2.2]*5]*10000000)).to_pandas(types_mapper=pd.ArrowDtype)})
client.load_table_from_dataframe(
df, 'temporary_tables.chunked_array_error')
Stack trace
File "/Users/<redacted>", line 250, in create_table_from_dataframe
load_job = self.client.load_table_from_dataframe(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/<redacted>/lib/python3.11/site-packages/google/cloud/bigquery/client.py", line 2671, in load_table_from_dataframe
new_job_config.schema = _pandas_helpers.dataframe_to_bq_schema(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/<redacted>/lib/python3.11/site-packages/google/cloud/bigquery/_pandas_helpers.py", line 465, in dataframe_to_bq_schema
bq_schema_out = augment_schema(dataframe, bq_schema_out)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/<redacted>lib/python3.11/site-packages/google/cloud/bigquery/_pandas_helpers.py", line 500, in augment_schema
arrow_table.values.type.id
^^^^^^^^^^^^^^^^^^
AttributeError: 'pyarrow.lib.ChunkedArray' object has no attribute 'values'
Making sure to follow these steps will guarantee the quickest resolution possible.
Thanks!
The easiest fix is to replace arrow_table.values.type.id
with arrow_table.type.value_type.id
.
I'm unable to reproduce the issue with the code you provided - the BQ table was created successfully. I wonder which version of PyArrow and Pandas you are using? I'm using pyarrow==15.0.0
, and pandas==2.2.0
. My guess is we don't support converting ChunkedArray
into BigQuery types, but your pandas/pyarrow decided to use it.
I will close this issue now. For future people who are interested in loading an Arrow ChunkedArray, please consider using Arrow's Table.from_arrays()
to convert it into a table.