support adbc APIs in DB-API package
Opened this issue ยท 3 comments
Is your feature request related to a problem? Please describe.
Getting data out of the DB-API can be slow for large results, especially if the desired format is Arrow, as is the case in polars (pola-rs/polars#18547, pola-rs/polars#17326)
Describe the solution you'd like
I'd love if our DB-API provided the custom Python methods described in https://arrow.apache.org/adbc/current/python/quickstart.html, such as fetch_arrow_table()
, adbc_ingest()
, adbc_get_info()
, and adbc_get_table_schema ()
. See: https://arrow.apache.org/adbc/current/python/api/adbc_driver_manager.html#adbc_driver_manager.dbapi.Connection and https://arrow.apache.org/adbc/current/python/api/adbc_driver_manager.html#adbc_driver_manager.dbapi.Cursor
Describe alternatives you've considered
A (better?) alternative would be for BigQuery to provide a real C/C++ ADBC driver, which could then be automatically usable from the Python adbc_driver_manager
package.
Additional context
I'm not expecting fast movement here, as the ADBC standard/community still feels pretty early days but figured it was worth filing for visibility given that polars is integrated with it.
A (better?) alternative would be for BigQuery to provide a real C/C++ ADBC driver, which could then be automatically usable from the Python adbc_driver_manager package.
I'm unsure if you're aware, but it looks like BigQuery does have ADBC drivers implemented in C# and Go. Python wheels were included for the first time in the recent release.
There is still a fair bit to be desired (including a page in the docs), with a to do list outlined here.
Also, according to the writing new drivers page:
Currently, new drivers can be written in C#, C/C++, Go, and Java. A driver written in C/C++ or Go can be used from either of those languages, as well as C#, Python, R, and Ruby. (C# can experimentally export drivers to the same set of languges as well.)
I haven't installed or tested anything - just observations. Just figured I would give a heads up on it.
Cheers
I was not aware. That's awesome news.
Here is the PyPi package - https://pypi.org/project/adbc-driver-bigquery/
After some tinkering with the low-level API I've also figured out how to read data directly into Polars without PyArrow or the BigQuery client libraries.
This is thanks to the Arrow PyCapsule Interface, implemented by Polars and the ADBC driver manger (specifically on the ArrowArrayStreamHandle returned by stmt.execute_query()
import adbc_driver_bigquery
import adbc_driver_manager
import polars as pl
PROJECT = "my-project"
db_kwargs = {adbc_driver_bigquery.DatabaseOptions.PROJECT_ID.value: PROJECT}
query = f"SELECT * FROM `{PROJECT}.testing.test` LIMIT 10"
with (
adbc_driver_bigquery.connect(db_kwargs) as db,
adbc_driver_manager.AdbcConnection(db) as conn,
adbc_driver_manager.AdbcStatement(conn) as stmt,
):
stmt.set_sql_query(query)
stream_handle, rows = stmt.execute_query()
df = pl.DataFrame(stream_handle)
print(df)
print(f"{rows = }")
# shape: (1, 1)
# โโโโโโโ
# โ foo โ
# โ --- โ
# โ i64 โ
# โโโโโโโก
# โ 1 โ
# โโโโโโโ
# rows = 1
$ pip list
Package Version
-------------------- -------
adbc-driver-bigquery 1.3.0
adbc-driver-manager 1.3.0
importlib_resources 6.5.2
pip 24.3.1
polars 1.19.0
setuptools 65.5.0
typing_extensions 4.12.2
The DBAPI (which is where adbc_ingest
also is) currently requires PyArrow. I think I'll ask the ADBC people if this can be relaxed/removed at all. adbc_ingest
also implements the Arrow PyCapsule Interface, so it could also be possible to write to BQ without PyArrow (with ADBC)
Perhaps the BigQuery client / BigQuery storage client could also implement Arrow PyCapsule Interface ๐, there is a growing list
Could this be possible for methods that already return results in the Arrow memory format? (like google.cloud.bigquery.table.RowIterator.to_arrow
)