geopandas/dask-geopandas

BUG: failure if manually specifying engine="pyarrow" in to_parquet

jorisvandenbossche opened this issue · 1 comments

I just noticed that when the argument engine="pyarrow" is provided to to_parquet() the write still fails with the same error.

import pandas as pd
import geopandas as gpd
import dask_geopandas as dgpd

dft = pd.util.testing.makeDataFrame()
dft["geometry"] = gpd.points_from_xy(dft.A, dft.B)
df = gpd.GeoDataFrame(dft)
df = dgpd.from_geopandas(df, npartitions=1)
df.to_parquet("mydf.parquet", engine="pyarrow")

Originally posted by @FlorisCalkoen in #198 (comment)

Ah, that is "expected", because you are then using dask's built-in "pyarrow" engine, and we actually extend that engine to handle the geometry dtype properly.

But of course, we should avoid that people can accidentally pass engine="pyarrow" and thus silently overwriting our own engine. Seems we need something more elaborate that the simple partial to do that:

to_parquet = partial(dd.to_parquet, engine=GeoArrowEngine)
to_parquet.__doc__ = dd.to_parquet.__doc__