natcap/taskgraph

Dictionary arguments cause combinatorial memory usage in `add_task`

richpsharp opened this issue · 0 comments

To reproduce:

task_graph = taskgraph.TaskGraph('.', -1)
arg_dict = {}
x={None: None}
for _ in range(4000):
    arg_dict[_] = x
def my_op(my_dict):
    print(my_dict)
task_graph.add_task(func=my_op, args=(), kwargs={'my_dict': arg_dict})
task_graph.join()

And observe serious memory use before my_op is called. This is caused by a bug in Task._filter_non_files that incorrectly strips dictionary argments and instead of cleaning a dictionary values, it makes a tuple for each element of the dictionary that includes a copy of the original dictionary.