re-compute parts of graph that are invalidated by changed placeholders?
pchalasani opened this issue · 5 comments
I'm trying to do the following sequence:
- (a) define some context C (by setting values of some placeholders)
- (b) compute some op P in a graph G with context C
- (c) now of course context C contains the state of the graph G, including all intermediate ops
- (d) update some placeholder(s) in C to new values
- (e) compute some op Q (could be same as P above, but needn't be) in graph G, in updated context C
Now when computing Q,
I want the graph to re-compute any ops that were invalidated by the updated placeholders in step (d).
But this doesn't happen because the context C is fully respected, i.e. all ops whose values are in C are taken as-is, even if they are invalid given the new values of the updated placeholders. Here's a toy example to show this:
import pythonflow as pf
import random
random.seed(1)
gr = pf.Graph()
with gr as graph:
b = pf.placeholder(name='b')
uniform = pf.func_op(random.uniform, 0, b, name = 'uniform') # depends on b
scaled_uniform = uniform*10
context = dict(b=1.0)
graph(scaled_uniform, context)
context[b] = 2.0 # update b to new value
# below I'd like 'uniform' to be re-calculated since the 'uniform' op
# depends on "b", which has changed
graph(scaled_uniform, context)
# but it does NOT recompute it since "uniform" is
# already in the context, and it uses its value
Is there some way to get the behavior I want?
In principle, you could determine the dependency tree of the graph and remove any operation from the context that depends on the value(s) you are changing. In practice, that is a bit more tricky because the resolution of dependencies may depend on the values in the context. For example, the pf.conditional
and pf.try_
operations evaluate different dependency trees depending on whether a condition is true or an operation fails, respectively.
The simplest solution may be to copy the values you are interested in to a new context. Although I understand that may be cumbersome for big graphs.
That's a good point about pf.conditional
and pf.try_
: essentially, it's not possible to determine statically which parts of the DAG would need to be recomputed when some upstream values change. However we could be conservative and invalidate all dependencies that are potentially going to need to be re-computed. In my use-case I was looking for a way to both have (a) cacheing to avoid un-necessary re-computation, and (b) invalidate dependent parts of the DAG that definitely need to be re-computed. I ended up implementing a simple version where for (a) I use joblib
's Memory module for cacheing, and for (b) I conservatively remove all descendants of a changed placeholder by recursively keeping track of the args of each op. (I used a variant of the GraphViz backend that was referenced elsewhere in this issue).
Yes, a conservative approach would be an option but it would still require an interface addition: there is no way to access all (possible) dependencies of an operation at the moment. For a standard operation op
, concatenating op.dependencies
, op.args
, and op.kwargs.values()
is sufficient. However, for more complex operations, such as pf.conditional
, there is no universal way to get all possible dependencies. I'm open to adding an option to retrieve all possible dependencies of an op, e.g. using a @property
.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
I think this is exactly the behaviour I was looking for too. One purpose of a computation graph is to make sure you re-calculate every dependency before using any variable (seems this is what python-flow is intended for). A different potential use of a computation graph is to only re-compute the variables that need to be re-computed due to a change of one or more input variables (I posted a summary of this requested functionality here and posted a pythonflow answer but then realized it doesn't meet this requirement). If it can do this functionality I will update this answer.
I think python flow recalculates and then forgets every dependency every time you run a graph operation.
Am I right?
thanks.