Mistaken documentation for dask_geopandas.read_parquet
barbuz opened this issue · 2 comments
I am trying to use the read_parquet
function in my code, and I've encountered some trouble where the documentation says one thing and the code seems to be doing something different.
These are the two discrepancies I have found, but there could be others:
- The
engine
parameter cannot really be set.dask-geopandas
uses a customGeoArrowEngine
engine, and trying to specify anything else raises an exception. - The documentation says that for
split_row_groups
"Default is True if a _metadata file is available or if the dataset is composed of a single file (otherwise defult is False).". It looks like the code does not even try to provide a default value and just passes it directly todask.dataframe.read_parquet
, which defaults to False in all cases.
I think the read_parquet
docstring is just copied directly over from dask.dataframe.read_parquet
at
dask-geopandas/dask_geopandas/io/parquet.py
Line 118 in bc3af63
Your first point is correct, only the custom engine is allowed in dask-geopandas I think.
For your second point, maybe raise an issue in dask/dask if you think that documentation isn't accurate there too. I know there's been some churn around that parameter recently and it may be out of date.
As for what to do here, I'm not sure. dask-geopandas mostly aligns with dask.dataframe.read_parquet, so it's nice to pick up the changes from there automatically. Dask does include a derived_from
decorator that can be used to copy over docstrings, with some control over things like "unused arugments": https://github.com/dask/dask/blob/34a1e88bb3f6196361f398ddab55e59d315d8d40/dask/utils.py#L818. Perhaps that could be used to add a caveat about the engine.
I see, thank you. What confused me is the fact that the dask documentation is different than the dask-geopandas one for split_row_groups
. But maybe as you said it could just be out of date.
I've noticed the effects of the derived_from
decorator, where some pages (example) have a disclaimer saying that this docstring was copied from somewhere else. Using the decorator for read_parquet
and to_parquet
should make the same disclaimer appear there, but I believe that would involve rewriting the function definitions so that the arguments used are listed explicitly and not just packed into *args
and *kwargs
.