pytoolz/toolz

Combination of map and accumulate

rindis opened this issue · 0 comments

I found myself needing something that seemed like a combination of map and accumulate, and thought it might be a useful addition to toolz.

The problem is that you have some sequence of objects; and you want to manipulate each of those objects, but a given manipulation depends on the result of the previous manipulation.

Example

Given the following list of dicts:

data = [{'id': 'a', 'value': 1},
        {'id': 'b', 'value': 2},
        {'id': 'c', 'value': 3},
        {'id': 'd', 'value': 4}]

I would like to take the number x = 4 and start subtracting the 'value' property of each element in data, until x becomes zero.

This is the expected behaviour of a function that does what I want:

>>> data = [{'id': 'a', 'value': 1},
...         {'id': 'b', 'value': 2},
...         {'id': 'c', 'value': 3},
...         {'id': 'd', 'value': 4}]
>>> func(data, 4)
[{'id': 'a', 'value': 0}, {'id': 'b', 'value': 0}, {'id': 'c', 'value': 2}, {'id': 'd', 'value': 4}]

Notice that for the dictionaries with 'id': 'a' and 'id': 'b', the 'value' is 0. For 'id': 'c' the value is 2, because at this point the value of x is 1. For 'id': 'd' the value is the same as the original, because x is 0 at this point.

A direct implementation of func could be:

def func(data, x_init):
    x = x_init
    new_data = []
    for elem in data:
        new_data.append(assoc_in(elem, ['value'], max(elem['value'] - x, 0)))
        x = max(x - elem['value'], 0)
    return new_data

Which can be refactored to:

def f(acc, elem):
    return assoc_in(elem, ['value'], max(elem['value'] - acc, 0))


def g(acc, elem):
    return max(acc - elem['value'], 0)


def func(data, x_init):
    x = x_init
    new_data = []
    for elem in data:
        new_data.append(f(x, elem))
        x = g(x, elem)
    return new_data

And could have a potential generalisation as:

def mapacc(mapper, accumulator, sequence, accumulator_init):
    sequence = iter(sequence)
    acc = accumulator_init
    for elem in sequence:
        result = mapper(acc, elem)
        acc = accumulator(acc, elem)
        yield result

Which could be used to solve the above example by:

>>> list(mapacc(mapper=lambda acc, elem: assoc_in(elem, ['value'], max(elem['value'] - acc, 0)),  # Manipulate element using acc
...             accumulator=lambda acc, elem: max(acc - elem['value'], 0),  # Update acc using element 
...             sequence=data,
...             accumulator_init=4))
[{'id': 'a', 'value': 0}, {'id': 'b', 'value': 0}, {'id': 'c', 'value': 2}, {'id': 'd', 'value': 4}]

EDIT

Updating the implementation to:

def mapacc(mapper, accumulator, sequence, accumulator_init):
    return zip(*_mapacc(mapper, accumulator, sequence, accumulator_init))

def mapacc(mapper, accumulator, sequence, accumulator_init):
    sequence = iter(sequence)
    acc = accumulator_init
    for elem in sequence:
        result = mapper(acc, elem)
        acc = accumulator(acc, elem)
        yield result, acc

Also returns how the accumulator has updated during the function:

x, y = mapacc(mapper=lambda acc, elem: assoc_in(elem, ['value'], max(elem['value'] - acc, 0)), # Manipulate element using acc
                   accumulator=lambda acc, elem: max(acc - elem['value'], 0),  # Update acc using element 
                   sequence=data,
                   accumulator_init=4)
x
({'id': 'a', 'value': 0}, {'id': 'b', 'value': 0}, {'id': 'c', 'value': 2}, {'id': 'd', 'value': 4})
y
(3, 1, 0, 0)