flow_selection with 'source.some_key' doesn't work because of pandas.eval()

Question

flow_selection with 'source.some_key' doesn't work because of pandas.eval()

mskoh52 opened this issue 6 years ago · 1 comments

When creating a dataset with dim_process (and also probably with dim_material and dim_time, although I didn't try), there is a problem if you try to later use the flow_selection param when creating a Bundle. Because the dataset is constructed by adding a prefix source. and target. (note the dot), pandas gets angry whenever floweaver.dataset.eval_selection gets called.

For example, if I have created a Dataset like so:

import pandas as pd
from floweaver import *
flows = pd.DataFrame(
    [['a', 'b', 10],
     ['a', 'c', 20],
     ['b', 'b', 5],
     ['b', 'd', 5],
     ['c', 'd', 20]],
    columns=['source', 'target', 'value']
)
processes = pd.DataFrame(
    ['fooA', 'fooB', 'fooC', 'fooD'],
    columns=['foo'],
    index=['a', 'b', 'c', 'd']
)

dataset = Dataset(flows, dim_process=processes)


nodes = {
    'node1': ProcessGroup(['a']),
    'node2': ProcessGroup(['b', 'c']),
    'node3': ProcessGroup(['d']),
    'wp': Waypoint(direction='R')
}
ordering = [['node1'], ['node2', 'wp'], ['node3']]
bundles = [
    Bundle('node1', 'node2'),
    Bundle('node2', 'node2', flow_selection='source.foo == "fooB"', waypoints=['wp']),
    Bundle('node2', 'node3'),
]

sdd = SankeyDefinition(nodes, bundles, ordering)

weave(sdd, dataset).to_widget()

This fails with the following traceback:

AttributeError                            Traceback (most recent call last)
<ipython-input-6-04c4391e4f95> in <module>
     33 sdd = SankeyDefinition(nodes, bundles, ordering)
     34 
---> 35 weave(sdd, dataset).to_widget()

~/.pyenv/versions/3.6.7/envs/flotest/lib/python3.6/site-packages/floweaver/weave.py in weave(sankey_definition, dataset, measures, link_width, link_color, palette)
     43     # Get the flows selected by the bundles
     44     bundle_flows, unused_flows = dataset.apply_view(
---> 45         sankey_definition.nodes, bundles2, sankey_definition.flow_selection)
     46 
     47     # Calculate the results graph (actual Sankey data)

~/.pyenv/versions/3.6.7/envs/flotest/lib/python3.6/site-packages/floweaver/dataset.py in apply_view(self, process_groups, bundles, flow_selection)
     88 
     89     def apply_view(self, process_groups, bundles, flow_selection=None):
---> 90         return _apply_view(self, process_groups, bundles, flow_selection)
     91 
     92     def save(self, filename):

~/.pyenv/versions/3.6.7/envs/flotest/lib/python3.6/site-packages/floweaver/dataset.py in _apply_view(dataset, process_groups, bundles, flow_selection)
    191         target = process_groups[bundle.target]
    192         flows, internal_source, internal_target = \
--> 193             find_flows(table, source.selection, target.selection, bundle.flow_selection)
    194         assert len(used_edges.intersection(
    195             flows.index.values)) == 0, 'duplicate bundle'

~/.pyenv/versions/3.6.7/envs/flotest/lib/python3.6/site-packages/floweaver/dataset.py in find_flows(flows, source_query, target_query, flow_query, ignore_edges)
    136     """
    137     if flow_query is not None:
--> 138         flows = flows[eval_selection(flows, '', flow_query)]
    139 
    140     if source_query is None and target_query is None:

~/.pyenv/versions/3.6.7/envs/flotest/lib/python3.6/site-packages/floweaver/dataset.py in eval_selection(df, column, sel)
     38                        local_dict={},
     39                        global_dict={},
---> 40                        resolvers=(resolver, ))
     41     else:
     42         raise TypeError('Unknown selection type: %s' % type(sel))

~/.pyenv/versions/3.6.7/envs/flotest/lib/python3.6/site-packages/pandas/core/frame.py in eval(self, expr, inplace, **kwargs)
   3191             kwargs['target'] = self
   3192         kwargs['resolvers'] = kwargs.get('resolvers', ()) + tuple(resolvers)
-> 3193         return _eval(expr, inplace=inplace, **kwargs)
   3194 
   3195     def select_dtypes(self, include=None, exclude=None):

.....
{removed for brevity}
.....

~/.pyenv/versions/3.6.7/envs/flotest/lib/python3.6/site-packages/pandas/core/computation/expr.py in visit_Attribute(self, node, **kwargs)
    548 
    549         raise ValueError("Invalid Attribute context {name}"
--> 550                          .format(name=ctx.__name__))
    551 
    552     def visit_Call_35(self, node, side=None, **kwargs):

AttributeError: 'Load' object has no attribute '__name__'

Full traceback here: https://gist.github.com/mskoh52/7054199865c214ad1de8f0e4772582d4

I was able to solve this by editing dataset.py and replacing all the dots with underscores on lines 28 and 70-77, then replacing flow_selection string with underscores as well. Can submit a PR if desired, but I might be missing some other places where the dot is important and not realize it (haven't looked too extensively at the rest of the code)

Answer 1 · 2019-04-03T16:09:10.000Z

Thanks for reporting this and sorry for the slow response. Replacing with underscores is a pragmatic solution but wouldn't be compatible with the way it's worked previously so I'd like to try to figure out what the problem is and make it work with dots if possible. I'll try to take a look soon, or help with diagnosing the underlying problem also welcome.