googleapis/python-bigquery

support adbc APIs in DB-API package

Opened this issue ยท 3 comments

Is your feature request related to a problem? Please describe.

Getting data out of the DB-API can be slow for large results, especially if the desired format is Arrow, as is the case in polars (pola-rs/polars#18547, pola-rs/polars#17326)

Describe the solution you'd like

I'd love if our DB-API provided the custom Python methods described in https://arrow.apache.org/adbc/current/python/quickstart.html, such as fetch_arrow_table(), adbc_ingest(), adbc_get_info(), and adbc_get_table_schema (). See: https://arrow.apache.org/adbc/current/python/api/adbc_driver_manager.html#adbc_driver_manager.dbapi.Connection and https://arrow.apache.org/adbc/current/python/api/adbc_driver_manager.html#adbc_driver_manager.dbapi.Cursor

Describe alternatives you've considered

A (better?) alternative would be for BigQuery to provide a real C/C++ ADBC driver, which could then be automatically usable from the Python adbc_driver_manager package.

Additional context

I'm not expecting fast movement here, as the ADBC standard/community still feels pretty early days but figured it was worth filing for visibility given that polars is integrated with it.

A (better?) alternative would be for BigQuery to provide a real C/C++ ADBC driver, which could then be automatically usable from the Python adbc_driver_manager package.

I'm unsure if you're aware, but it looks like BigQuery does have ADBC drivers implemented in C# and Go. Python wheels were included for the first time in the recent release.

There is still a fair bit to be desired (including a page in the docs), with a to do list outlined here.

Also, according to the writing new drivers page:

Currently, new drivers can be written in C#, C/C++, Go, and Java. A driver written in C/C++ or Go can be used from either of those languages, as well as C#, Python, R, and Ruby. (C# can experimentally export drivers to the same set of languges as well.)

I haven't installed or tested anything - just observations. Just figured I would give a heads up on it.

Cheers

tswast commented

I was not aware. That's awesome news.

Here is the PyPi package - https://pypi.org/project/adbc-driver-bigquery/

After some tinkering with the low-level API I've also figured out how to read data directly into Polars without PyArrow or the BigQuery client libraries.
This is thanks to the Arrow PyCapsule Interface, implemented by Polars and the ADBC driver manger (specifically on the ArrowArrayStreamHandle returned by stmt.execute_query()

import adbc_driver_bigquery
import adbc_driver_manager
import polars as pl

PROJECT = "my-project"
db_kwargs = {adbc_driver_bigquery.DatabaseOptions.PROJECT_ID.value: PROJECT}
query = f"SELECT * FROM `{PROJECT}.testing.test` LIMIT 10"

with (
    adbc_driver_bigquery.connect(db_kwargs) as db,
    adbc_driver_manager.AdbcConnection(db) as conn,
    adbc_driver_manager.AdbcStatement(conn) as stmt,
):
    stmt.set_sql_query(query)
    stream_handle, rows = stmt.execute_query()

    df = pl.DataFrame(stream_handle)
    print(df)
    print(f"{rows = }")

# shape: (1, 1)
# โ”Œโ”€โ”€โ”€โ”€โ”€โ”
# โ”‚ foo โ”‚
# โ”‚ --- โ”‚
# โ”‚ i64 โ”‚
# โ•žโ•โ•โ•โ•โ•โ•ก
# โ”‚ 1   โ”‚
# โ””โ”€โ”€โ”€โ”€โ”€โ”˜
# rows = 1
$ pip list
Package              Version
-------------------- -------
adbc-driver-bigquery 1.3.0
adbc-driver-manager  1.3.0
importlib_resources  6.5.2
pip                  24.3.1
polars               1.19.0
setuptools           65.5.0
typing_extensions    4.12.2

The DBAPI (which is where adbc_ingest also is) currently requires PyArrow. I think I'll ask the ADBC people if this can be relaxed/removed at all. adbc_ingest also implements the Arrow PyCapsule Interface, so it could also be possible to write to BQ without PyArrow (with ADBC)

Perhaps the BigQuery client / BigQuery storage client could also implement Arrow PyCapsule Interface ๐Ÿ˜‰, there is a growing list
Could this be possible for methods that already return results in the Arrow memory format? (like google.cloud.bigquery.table.RowIterator.to_arrow)