pytoolz/toolz

The "collect" decorator

maxme1 opened this issue ยท 9 comments

Hi!
I often encounter the same pattern in my code:

def to_power(array, powers):
    result = []
    for entry in array:
        power = powers.get(entry)
        if power is not None and power >= 0:
            result.append(entry ** power)

    return result


def reverse_mapping(expensive_func, values):
    result = {}
    for entry in values:
        value = expensive_func(entry)
        if value is not None:
            result[value] = entry

    return result

The examples are somewhat simplistic, but you get the idea:

  1. create a container
  2. iterate some data, do some branching, etc and fill the container
  3. return the container

I came up with several decorators that reduce this to:

@collect
def to_power(array, powers):
    for entry in array:
        power = powers.get(entry)
        if power is not None and power >= 0:
            yield entry ** power


@composed(dict)
def reverse_mapping(expensive_func, values):
    for entry in values:
        value = expensive_func(entry)
        if value is not None:
            yield value, entry

composed(func) simply applies func to the result of the decorated function, which effectively gathers the generator. And collect is just a shorter version of composed(list)

I can create a PR with my implementation, if you are interested in adding it to toolz.

This seems like a really worthwhile ergonomics boon. If I was the maintainer of toolz I would be leaning hard towards "yes" on this.

Goes well with #486 - if the starcall function described there is available, then this creates a nicer way of defining functions that return args or kwargs for the next function in the composition.

I'm imagining @composed(Arguments.wrap) on top of @collect or @composed(dict).

Could maybe even enhance the Arguments in that other issue enable something like @composed(Arguments.build) on functions that do yield value, yield 'name', value, or both.

I decided to add a PR in case you decide to add it to toolz

So... I'm not a big fan.

In my mind, the functional style of programming is about separating the "business" logic from the iteration logic that feeds them data.

These decorators encourage people to stop before achieving such separation.

consider the following:

@curry
def raise(powers, entry):
    power = powers.get(entry)
    return entry ** power if power is not None and power >= 0 else None

def to_power(powers):
    return compose_left(
      map(raise(powers))
    , remove(identity)
    , list
    )
def reverse_mapping(expensive_func):
    return compose_left( 
         map(juxt(expensive_func, identity))
        , dict
        , keyfilter(identity)
    )

@ZeroBomb for what it's worth, I find your first alternative harder to understand than either the manual list-appending version or this "collect into list" decorator proposal.

(I had the luck of trying to understand your alternatives from a clean slate, because I totally forgot the examples from the beginning of this issue.)

The biggest reason is that the relevant information gets spread out further apart and I feel myself having to think and remember more across the two functions to re-connect it.

I suggest looking at this proposal more abstractly:

  1. composed is a decorator version of compose + partial. That can be pretty nice shorthand. Some of the arguments from PEP-318 in favor of decorators apply: some things are clearer if they immediately precede a function rather than being buried below it.

    Example: @composed(log_result, prefix="whatever") on top of def foo instead of foo = compose_left(foo, partial(log_result, prefix="whatever")) below the whole body of foo.

  2. "Collect yielded values into list/dict/whatever" adds ergonomics for building arbitrary collections. A good option in situations where literals, list/dict/set comprehensions, or map/filter start to get awkward. Sometimes there's a lot of coupling between multiple loops; sometimes logic can be more readably expressed in a full function body.

@mentalisttraceur you're reading my mind!
I mostly use these decorators as a cleaner way to write "comprehension-like" code, when I have to chew through some data.

@ZeroBomb
Consider this code (almost verbatim from one of my projects):

@collect
def interpolate_dict(slices, size, default):
    for index in range(size):
        # find_nearest_keys's contents are not relevant for the example
        start, stop = find_nearest_keys(slices, index)
        if index in slices:
            yield slices[index]

        elif start is not None and stop is not None:
            try:
                # interpolate's contents are not relevant for the example
                yield interpolate(slices[start], slices[stop], (index - start) / (stop - start))
            except InterpolationError:
                yield default

        else:
            yield []

Here you could do some comprehensions and exceptions rerouting and maybe define a tiny function that calls find_nearest_keys inside it (because we need a scope for start and stop). But would it be more readable? In my experience when you're working with some irregular data it's much cleaner to write a loop, than an one-off-function.

So I will admit to writing decorators like the ones in this proposal in the past, but ultimately I would end up removing them because I was using them in ways that ended up being counter-productive.

The pattern I observed in my own coding would go like this:

  1. I write my functions simple and focused on the problem at hand (good!)
  2. Next, I want to use them someplace, but it would be more convenient if they (automatically mapped themselves over dataframes/formatted their results a different way/did some kind of post processing/etc)
  3. I know! I'll decorate the simple functions so they do that!
  4. Wait, now I need the functionality of the underlying function, but without the decoration, or with the results formatted a different way...
  5. I remove the decoration and put the responsibility for pre- or post- processing the function's data where the pre- or post- processing is needed, instead of permanently attaching it to the simple function.

In @maxme1 's example, I would gravitate towards something like this:

@curry
def interpolate_index(slices, index, default=None):
    # business code, and nothing but.  
    # This describes exactly the process for calculating the interpolated value at an index
    if index in slices:
        return slices[index]
    start, stop = find_nearest_keys(slices, index)
    if start and stop:
       try:
           return interpolate(slices[start], slices[stop], (index - start) / (stop - start))
       except InterpolationError:
           return default
    return []

def interpolate_dict(slices, size, default=None):
    # This describes the process for interpolating over a range of values
    return map(interpolate_index(slices, default=default), range(size))

# More generally
def apply_interpolator(interpolator, size):
    # This describes the process for interpolating over a range of values
    return map(interpolator, range(size))

There is a sense in which you have to keep two things in mind (the mapping function and the mapped function), but I would argue that:

  1. Mapping functions over collections of values is a fundamental concept in the functional style of programming, and the toolz library should encourage this style.
  2. The interpolate_index function is the basic "unit of work" in this algorithm, and setting it apart actually clarifies what you are trying to accomplish.
  3. Putting the iteration in a separate function gives you the ability to trivially add e.g. a parallelized version of the iterpolate_dict function without touching your business logic at all.

Isn't writing code like this a bit too much of future-proofing?
Yes, in perspective I could restructure my code as you suggested, but if I need the function right now, writing a single function with a decorator is simpler and (arguably) more readable.

But I see your point. You're basically saying that decorators like these are against the philosophy of toolz, and this is a strong argument ๐Ÿ™‚

@ZeroBomb @mentalisttraceur maybe you could suggest another package where this functionality will fit?

Thanks for the feedback, everyone!
I didn't think of anything simpler than moving these 2 decorators to a separate library - jboc (Just a Bunch Of Code).
Pretty dumb, but gets the job done ๐Ÿ™‚