Attribute Error from dask 1.1.1
Closed this issue · 8 comments
When running your dask delayed decorator example, I hit this error:
Traceback (most recent call last):
File "./test3.py", line 37, in <module>
print(result.compute())
File "/nix/store/60l4c79ba9ss8hd9qpah85rh0kmgbyzh-python3.6-dask-1.1.1/lib/python3.6/site-packages/dask/base.py", line 156, in compute
(result,) = compute(self, traverse=False, **kwargs)
File "/nix/store/60l4c79ba9ss8hd9qpah85rh0kmgbyzh-python3.6-dask-1.1.1/lib/python3.6/site-packages/dask/base.py", line 395, in compute
dsk = collections_to_dsk(collections, optimize_graph, **kwargs)
File "/nix/store/60l4c79ba9ss8hd9qpah85rh0kmgbyzh-python3.6-dask-1.1.1/lib/python3.6/site-packages/dask/base.py", line 194, in collections_to_dsk
for opt, (dsk, keys) in groups.items()]))
File "/nix/store/60l4c79ba9ss8hd9qpah85rh0kmgbyzh-python3.6-dask-1.1.1/lib/python3.6/site-packages/dask/base.py", line 194, in <listcomp>
for opt, (dsk, keys) in groups.items()]))
File "/nix/store/1fqfcjjrka11b6np7qgcwsk8ld6bcq2w-python3.6-graphchain-1.0.0/lib/python3.6/site-packages/graphchain/core.py", line 346, in optimize
dsk = dsk.copy()
AttributeError: 'HighLevelGraph' object has no attribute 'copy'
This was after I fixed the some incorrect parameter names.
The code to achieve this was:
import dask
import graphchain
import pandas as pd
@dask.delayed(pure=True)
def create_dataframe(num_rows, num_cols):
print('Creating DataFrame...')
return pd.DataFrame(data=[range(num_cols)]*num_rows)
@dask.delayed(pure=True)
def create_dataframe2(num_rows, num_cols):
print('Creating DataFrame...')
return pd.DataFrame(data=[range(num_cols)]*num_rows)
@dask.delayed(pure=True)
def complicated_computation(df, num_quantiles):
print('Running complicated computation on DataFrame...')
return df.quantile(q=[i / num_quantiles for i in range(num_quantiles)])
@dask.delayed(pure=True)
def summarise_dataframes(*dfs):
print('Summing DataFrames...')
return sum(df.sum().sum() for df in dfs)
df_a = create_dataframe(10_000, 1000)
df_b = create_dataframe2(10_000, 1000)
df_c = complicated_computation(df_a, 2048)
df_d = complicated_computation(df_b, 2048)
result = summarise_dataframes(df_c, df_d)
with dask.config.set(scheduler='sync', delayed_optimize=graphchain.optimize):
print(result.compute())
I think the optimize
function expects a simple dictionary, but what is copied into the core.optimize
function is actually a high level graph. The high level graph is not just a simple dictionary: http://docs.dask.org/en/latest/high-level-graphs.html#highlevelgraphs
I believe this PR dask/dask#4092 is what changed the underlying Dask graph representation and has broken graphchain.
Thanks for the bug report @CMCDragonkai. We're working on a PR to add support for HighLevelGraph
s!
Published graphchain v1.1.0 to PyPI. Let us know if you encounter any other issues!
Since you changed the pytest options, I get this error now:
pytest: error: unrecognized arguments: --cov=graphchain
It seems there was an additional option added in with setup.cfg
. But this option is not recognised in pytest 4.2.1.
You'll need the pytest-cov
extension [1] for pytest
to work with the bundled setup.cfg
. Or, you can also choose not to use our setup.cfg
(which we use for development, not in the graphchain distribution).
Should pytest-cov
then be part of the list of dependencies specified somewhere?
If you installed graphchain with pip install graphchain
, you should have all the required runtime dependencies. The runtime dependencies are listed in setup.py [1].
If you want to modify or contribute to graphchain, then you do indeed need additional dependencies. Those are listed in the local conda environment spec [2]. You can install these with conda env create -f environment.local.yml
, followed by source activate graphchain-local-env
.
[1] https://github.com/radix-ai/graphchain/blob/master/setup.py#L31
[2] https://github.com/radix-ai/graphchain/blob/master/environment.local.yml