Dask Expr Dataframes are no longer instances of dd.core.DataFrame
Closed this issue · 2 comments
andoni-garcia-fgp commented
Describe the issue:
When including Dask Expr in my dependencies, my default Dask Dataframes are no longer instances of dd.core.DataFrame
. This leads to subtle bugs during isinstance(ddf, dd.DataFrame)
checks depending on the exact library call used to construct the DataFrame.
Minimal Complete Verifiable Example:
test_ddf = dd.from_dict({}, npartitions=1)
type(test_ddf)
> <class 'dask_expr._collection.DataFrame'>
isinstance(test_ddf, dd.core.DataFrame)
> False
isinstance(test_ddf, dd.DataFrame)
> True
test_pdf = pd.DataFrame()
test_ddf = dd.from_pandas(test_pdf, npartitions=1)
isinstance(test_ddf, dd.DataFrame)
> True
isinstance(test_ddf, dd.core.DataFrame)
> False
test_ddf = dd.io.io.from_pandas(test_pdf, npartitions=1)
isinstance(test_ddf, dd.DataFrame)
> False
isinstance(test_ddf, dd.core.DataFrame)
> True
Anything else we need to know?:
Environment:
Installed via poetry:
dask = {extras = ["complete", "distributed"], version = "^2024.5.2"}
dask-expr = "^1.1.2"
[[package]]
name = "dask"
version = "2024.6.2"
[package.extras]
array = ["numpy (>=1.21)"]
complete = ["dask[array,dataframe,diagnostics,distributed]", "lz4 (>=4.3.2)", "pyarrow (>=7.0)", "pyarrow-hotfix"]
dataframe = ["dask-expr (>=1.1,<1.2)", "dask[array]", "pandas (>=1.3)"]
diagnostics = ["bokeh (>=2.4.2)", "jinja2 (>=2.10.3)"]
distributed = ["distributed (==2024.6.2)"]
test = ["pandas[test]", "pre-commit", "pytest", "pytest-cov", "pytest-rerunfailures", "pytest-timeout", "pytest-xdist"]
[[package]]
name = "dask-expr"
version = "1.1.6"
[package.extras]
analyze = ["crick", "distributed"]
phofl commented
This is intended, you should use dd.DataFrame, dd.core is considered private
phofl commented
Closing this for now