superlinear-ai/graphchain

Attribute Error from dask 1.1.1

Closed this issue · 8 comments

When running your dask delayed decorator example, I hit this error:

Traceback (most recent call last):
  File "./test3.py", line 37, in <module>
    print(result.compute())
  File "/nix/store/60l4c79ba9ss8hd9qpah85rh0kmgbyzh-python3.6-dask-1.1.1/lib/python3.6/site-packages/dask/base.py", line 156, in compute
    (result,) = compute(self, traverse=False, **kwargs)
  File "/nix/store/60l4c79ba9ss8hd9qpah85rh0kmgbyzh-python3.6-dask-1.1.1/lib/python3.6/site-packages/dask/base.py", line 395, in compute
    dsk = collections_to_dsk(collections, optimize_graph, **kwargs)
  File "/nix/store/60l4c79ba9ss8hd9qpah85rh0kmgbyzh-python3.6-dask-1.1.1/lib/python3.6/site-packages/dask/base.py", line 194, in collections_to_dsk
    for opt, (dsk, keys) in groups.items()]))
  File "/nix/store/60l4c79ba9ss8hd9qpah85rh0kmgbyzh-python3.6-dask-1.1.1/lib/python3.6/site-packages/dask/base.py", line 194, in <listcomp>
    for opt, (dsk, keys) in groups.items()]))
  File "/nix/store/1fqfcjjrka11b6np7qgcwsk8ld6bcq2w-python3.6-graphchain-1.0.0/lib/python3.6/site-packages/graphchain/core.py", line 346, in optimize
    dsk = dsk.copy()
AttributeError: 'HighLevelGraph' object has no attribute 'copy'

This was after I fixed the some incorrect parameter names.

The code to achieve this was:

import dask
import graphchain
import pandas as pd

@dask.delayed(pure=True)
def create_dataframe(num_rows, num_cols):
    print('Creating DataFrame...')
    return pd.DataFrame(data=[range(num_cols)]*num_rows)

@dask.delayed(pure=True)
def create_dataframe2(num_rows, num_cols):
    print('Creating DataFrame...')
    return pd.DataFrame(data=[range(num_cols)]*num_rows)

@dask.delayed(pure=True)
def complicated_computation(df, num_quantiles):
    print('Running complicated computation on DataFrame...')
    return df.quantile(q=[i / num_quantiles for i in range(num_quantiles)])

@dask.delayed(pure=True)
def summarise_dataframes(*dfs):
    print('Summing DataFrames...')
    return sum(df.sum().sum() for df in dfs)

df_a = create_dataframe(10_000, 1000)
df_b = create_dataframe2(10_000, 1000)
df_c = complicated_computation(df_a, 2048)
df_d = complicated_computation(df_b, 2048)
result = summarise_dataframes(df_c, df_d)

with dask.config.set(scheduler='sync', delayed_optimize=graphchain.optimize):
    print(result.compute())

I think the optimize function expects a simple dictionary, but what is copied into the core.optimize function is actually a high level graph. The high level graph is not just a simple dictionary: http://docs.dask.org/en/latest/high-level-graphs.html#highlevelgraphs

I believe this PR dask/dask#4092 is what changed the underlying Dask graph representation and has broken graphchain.

Thanks for the bug report @CMCDragonkai. We're working on a PR to add support for HighLevelGraphs!

Published graphchain v1.1.0 to PyPI. Let us know if you encounter any other issues!

Since you changed the pytest options, I get this error now:

pytest: error: unrecognized arguments: --cov=graphchain

It seems there was an additional option added in with setup.cfg. But this option is not recognised in pytest 4.2.1.

You'll need the pytest-cov extension [1] for pytest to work with the bundled setup.cfg. Or, you can also choose not to use our setup.cfg (which we use for development, not in the graphchain distribution).

[1] https://anaconda.org/search?q=pytest-cov

Should pytest-cov then be part of the list of dependencies specified somewhere?

If you installed graphchain with pip install graphchain, you should have all the required runtime dependencies. The runtime dependencies are listed in setup.py [1].

If you want to modify or contribute to graphchain, then you do indeed need additional dependencies. Those are listed in the local conda environment spec [2]. You can install these with conda env create -f environment.local.yml, followed by source activate graphchain-local-env.

[1] https://github.com/radix-ai/graphchain/blob/master/setup.py#L31
[2] https://github.com/radix-ai/graphchain/blob/master/environment.local.yml