googleapis/python-bigquery

BUG: `read_pandas` Error: 'values' Attribute Missing in `ChunkedArray`

chelsea-lin opened this issue · 4 comments

Environment details

  • OS type and version: ubuntu
  • Python version: Python 3.9.18
  • pip version: pip 24.1.2
  • google-cloud-bigquery version: Version: 3.16.0
  • pandas version: 1.5.0

Code example

    def test_getitem_w_struct_array():
        pa_struct = pa.struct(
            [
                ("name", pa.string()),
                ("age", pa.int64()),
            ]
        )
        data: list[list[dict]] = [
            [
                {"name": "Alice", "age": 30},
                {"name": "Bob", "age": 25},
            ],
            [
                {"name": "Charlie", "age": 35},
                {"name": "David", "age": 40},
                {"name": "Eva", "age": 28},
            ],
            [],
            [{"name": "Frank", "age": 50}],
        ]
>       s = bpd.Series(data, dtype=bpd.ArrowDtype(pa.list_(pa_struct)))

Stack trace

[tests/system/small/operations/test_strings.py:655](https://cs.corp.google.com/piper///depot/google3/tests/system/small/operations/test_strings.py?l=655): 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
[bigframes/core/log_adapter.py:56](https://cs.corp.google.com/piper///depot/google3/bigframes/core/log_adapter.py?l=56): in wrapper
    return method(*args, **kwargs)
[bigframes/series.py:75](https://cs.corp.google.com/piper///depot/google3/bigframes/series.py?l=75): in __init__
    super().__init__(*args, **kwargs)
[bigframes/operations/base.py:114](https://cs.corp.google.com/piper///depot/google3/bigframes/operations/base.py?l=114): in __init__
    data = indexes.Index(data, dtype=dtype, name=name, session=session)
[bigframes/core/indexes/base.py:89](https://cs.corp.google.com/piper///depot/google3/bigframes/core/indexes/base.py?l=89): in __new__
    block = df.DataFrame(pd_df, session=session)._block
[bigframes/core/log_adapter.py:56](https://cs.corp.google.com/piper///depot/google3/bigframes/core/log_adapter.py?l=56): in wrapper
    return method(*args, **kwargs)
[bigframes/dataframe.py:197](https://cs.corp.google.com/piper///depot/google3/bigframes/dataframe.py?l=197): in __init__
    self._block = bigframes.pandas.read_pandas(pd_dataframe)._get_block()
[bigframes/pandas/__init__.py:617](https://cs.corp.google.com/piper///depot/google3/bigframes/pandas/__init__.py?l=617): in read_pandas
    return global_session.with_default_session(
[bigframes/core/global_session.py:113](https://cs.corp.google.com/piper///depot/google3/bigframes/core/global_session.py?l=113): in with_default_session
    return func(get_global_session(), *args, **kwargs)
[bigframes/session/__init__.py:1130](https://cs.corp.google.com/piper///depot/google3/bigframes/session/__init__.py?l=1130): in read_pandas
    return self._read_pandas(pandas_dataframe, "read_pandas")
[bigframes/session/__init__.py:1151](https://cs.corp.google.com/piper///depot/google3/bigframes/session/__init__.py?l=1151): in _read_pandas
    return self._read_pandas_load_job(pandas_dataframe, api_name)
[bigframes/session/__init__.py:1232](https://cs.corp.google.com/piper///depot/google3/bigframes/session/__init__.py?l=1232): in _read_pandas_load_job
    load_job = self.bqclient.load_table_from_dataframe(
[.nox/system-3-9/lib/python3.9/site-packages/google/cloud/bigquery/client.py:2671](https://cs.corp.google.com/piper///depot/google3/.nox/system-3-9/lib/python3.9/site-packages/google/cloud/bigquery/client.py?l=2671): in load_table_from_dataframe
    new_job_config.schema = _pandas_helpers.dataframe_to_bq_schema(
[.nox/system-3-9/lib/python3.9/site-packages/google/cloud/bigquery/_pandas_helpers.py:465](https://cs.corp.google.com/piper///depot/google3/.nox/system-3-9/lib/python3.9/site-packages/google/cloud/bigquery/_pandas_helpers.py?l=465): in dataframe_to_bq_schema
    bq_schema_out = augment_schema(dataframe, bq_schema_out)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

dataframe = <[ValueError('setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (4,) + inhomogeneous part.') raised in repr()] DataFrame object at 0x153b8e541970>
current_bq_schema = [SchemaField('bigframes_unnamed_index', None, 'NULLABLE', None, None, (), None), SchemaField('rowid', 'INTEGER', 'NULLABLE', None, None, (), None)]

    def augment_schema(dataframe, current_bq_schema):
        """Try to deduce the unknown field types and return an improved schema.
    
        This function requires ``pyarrow`` to run. If all the missing types still
        cannot be detected, ``None`` is returned. If all types are already known,
        a shallow copy of the given schema is returned.
    
        Args:
            dataframe (pandas.DataFrame):
                DataFrame for which some of the field types are still unknown.
            current_bq_schema (Sequence[google.cloud.bigquery.schema.SchemaField]):
                A BigQuery schema for ``dataframe``. The types of some or all of
                the fields may be ``None``.
        Returns:
            Optional[Sequence[google.cloud.bigquery.schema.SchemaField]]
        """
        # pytype: disable=attribute-error
        augmented_schema = []
        unknown_type_fields = []
        for field in current_bq_schema:
            if field.field_type is not None:
                augmented_schema.append(field)
                continue
    
            arrow_table = pyarrow.array(dataframe.reset_index()[field.name])
    
            if pyarrow.types.is_list(arrow_table.type):
                # `pyarrow.ListType`
                detected_mode = "REPEATED"
                detected_type = _pyarrow_helpers.arrow_scalar_ids_to_bq(
>                   arrow_table.values.type.id
                )
E               AttributeError: 'pyarrow.lib.ChunkedArray' object has no attribute 'values'

[.nox/system-3-9/lib/python3.9/site-packages/google/cloud/bigquery/_pandas_helpers.py:500](https://cs.corp.google.com/piper///depot/google3/.nox/system-3-9/lib/python3.9/site-packages/google/cloud/bigquery/_pandas_helpers.py?l=500): AttributeError

Looks like an incompatibility with the latest pyarrow. I wonder why prerelease tests didn't catch it?

This test is constructing an array<struct> object, which is new to BigFrames. I am not sure if python-bigquery has any similar tests.

Hey @sycai, could I assign this task to you if you have any bandwidth?

Sure!