dask/dask-expr

Dask Expr Dataframes are no longer instances of dd.core.DataFrame

Closed this issue · 2 comments

Describe the issue:

When including Dask Expr in my dependencies, my default Dask Dataframes are no longer instances of dd.core.DataFrame. This leads to subtle bugs during isinstance(ddf, dd.DataFrame) checks depending on the exact library call used to construct the DataFrame.

Minimal Complete Verifiable Example:

test_ddf = dd.from_dict({}, npartitions=1)
type(test_ddf)
> <class 'dask_expr._collection.DataFrame'>
isinstance(test_ddf, dd.core.DataFrame)
> False
isinstance(test_ddf, dd.DataFrame)
> True

test_pdf = pd.DataFrame()
test_ddf = dd.from_pandas(test_pdf, npartitions=1)
isinstance(test_ddf, dd.DataFrame)
> True
isinstance(test_ddf, dd.core.DataFrame)
> False

test_ddf = dd.io.io.from_pandas(test_pdf, npartitions=1)
isinstance(test_ddf, dd.DataFrame)
> False
isinstance(test_ddf, dd.core.DataFrame)
> True

Anything else we need to know?:

Environment:

Installed via poetry:

dask = {extras = ["complete", "distributed"], version = "^2024.5.2"}
dask-expr = "^1.1.2"
[[package]]
name = "dask"
version = "2024.6.2"

[package.extras]
array = ["numpy (>=1.21)"]
complete = ["dask[array,dataframe,diagnostics,distributed]", "lz4 (>=4.3.2)", "pyarrow (>=7.0)", "pyarrow-hotfix"]
dataframe = ["dask-expr (>=1.1,<1.2)", "dask[array]", "pandas (>=1.3)"]
diagnostics = ["bokeh (>=2.4.2)", "jinja2 (>=2.10.3)"]
distributed = ["distributed (==2024.6.2)"]
test = ["pandas[test]", "pre-commit", "pytest", "pytest-cov", "pytest-rerunfailures", "pytest-timeout", "pytest-xdist"]


[[package]]
name = "dask-expr"
version = "1.1.6"

[package.extras]
analyze = ["crick", "distributed"]

This is intended, you should use dd.DataFrame, dd.core is considered private

Closing this for now