flow_selection with 'source.some_key' doesn't work because of pandas.eval()
mskoh52 opened this issue · 1 comments
When creating a dataset with dim_process
(and also probably with dim_material
and dim_time
, although I didn't try), there is a problem if you try to later use the flow_selection
param when creating a Bundle
. Because the dataset is constructed by adding a prefix source.
and target.
(note the dot), pandas gets angry whenever floweaver.dataset.eval_selection
gets called.
For example, if I have created a Dataset
like so:
import pandas as pd
from floweaver import *
flows = pd.DataFrame(
[['a', 'b', 10],
['a', 'c', 20],
['b', 'b', 5],
['b', 'd', 5],
['c', 'd', 20]],
columns=['source', 'target', 'value']
)
processes = pd.DataFrame(
['fooA', 'fooB', 'fooC', 'fooD'],
columns=['foo'],
index=['a', 'b', 'c', 'd']
)
dataset = Dataset(flows, dim_process=processes)
nodes = {
'node1': ProcessGroup(['a']),
'node2': ProcessGroup(['b', 'c']),
'node3': ProcessGroup(['d']),
'wp': Waypoint(direction='R')
}
ordering = [['node1'], ['node2', 'wp'], ['node3']]
bundles = [
Bundle('node1', 'node2'),
Bundle('node2', 'node2', flow_selection='source.foo == "fooB"', waypoints=['wp']),
Bundle('node2', 'node3'),
]
sdd = SankeyDefinition(nodes, bundles, ordering)
weave(sdd, dataset).to_widget()
This fails with the following traceback:
AttributeError Traceback (most recent call last)
<ipython-input-6-04c4391e4f95> in <module>
33 sdd = SankeyDefinition(nodes, bundles, ordering)
34
---> 35 weave(sdd, dataset).to_widget()
~/.pyenv/versions/3.6.7/envs/flotest/lib/python3.6/site-packages/floweaver/weave.py in weave(sankey_definition, dataset, measures, link_width, link_color, palette)
43 # Get the flows selected by the bundles
44 bundle_flows, unused_flows = dataset.apply_view(
---> 45 sankey_definition.nodes, bundles2, sankey_definition.flow_selection)
46
47 # Calculate the results graph (actual Sankey data)
~/.pyenv/versions/3.6.7/envs/flotest/lib/python3.6/site-packages/floweaver/dataset.py in apply_view(self, process_groups, bundles, flow_selection)
88
89 def apply_view(self, process_groups, bundles, flow_selection=None):
---> 90 return _apply_view(self, process_groups, bundles, flow_selection)
91
92 def save(self, filename):
~/.pyenv/versions/3.6.7/envs/flotest/lib/python3.6/site-packages/floweaver/dataset.py in _apply_view(dataset, process_groups, bundles, flow_selection)
191 target = process_groups[bundle.target]
192 flows, internal_source, internal_target = \
--> 193 find_flows(table, source.selection, target.selection, bundle.flow_selection)
194 assert len(used_edges.intersection(
195 flows.index.values)) == 0, 'duplicate bundle'
~/.pyenv/versions/3.6.7/envs/flotest/lib/python3.6/site-packages/floweaver/dataset.py in find_flows(flows, source_query, target_query, flow_query, ignore_edges)
136 """
137 if flow_query is not None:
--> 138 flows = flows[eval_selection(flows, '', flow_query)]
139
140 if source_query is None and target_query is None:
~/.pyenv/versions/3.6.7/envs/flotest/lib/python3.6/site-packages/floweaver/dataset.py in eval_selection(df, column, sel)
38 local_dict={},
39 global_dict={},
---> 40 resolvers=(resolver, ))
41 else:
42 raise TypeError('Unknown selection type: %s' % type(sel))
~/.pyenv/versions/3.6.7/envs/flotest/lib/python3.6/site-packages/pandas/core/frame.py in eval(self, expr, inplace, **kwargs)
3191 kwargs['target'] = self
3192 kwargs['resolvers'] = kwargs.get('resolvers', ()) + tuple(resolvers)
-> 3193 return _eval(expr, inplace=inplace, **kwargs)
3194
3195 def select_dtypes(self, include=None, exclude=None):
.....
{removed for brevity}
.....
~/.pyenv/versions/3.6.7/envs/flotest/lib/python3.6/site-packages/pandas/core/computation/expr.py in visit_Attribute(self, node, **kwargs)
548
549 raise ValueError("Invalid Attribute context {name}"
--> 550 .format(name=ctx.__name__))
551
552 def visit_Call_35(self, node, side=None, **kwargs):
AttributeError: 'Load' object has no attribute '__name__'
Full traceback here: https://gist.github.com/mskoh52/7054199865c214ad1de8f0e4772582d4
I was able to solve this by editing dataset.py
and replacing all the dots with underscores on lines 28 and 70-77, then replacing flow_selection
string with underscores as well. Can submit a PR if desired, but I might be missing some other places where the dot is important and not realize it (haven't looked too extensively at the rest of the code)
Thanks for reporting this and sorry for the slow response. Replacing with underscores is a pragmatic solution but wouldn't be compatible with the way it's worked previously so I'd like to try to figure out what the problem is and make it work with dots if possible. I'll try to take a look soon, or help with diagnosing the underlying problem also welcome.