The "collect" decorator
maxme1 opened this issue ยท 9 comments
Hi!
I often encounter the same pattern in my code:
def to_power(array, powers):
result = []
for entry in array:
power = powers.get(entry)
if power is not None and power >= 0:
result.append(entry ** power)
return result
def reverse_mapping(expensive_func, values):
result = {}
for entry in values:
value = expensive_func(entry)
if value is not None:
result[value] = entry
return result
The examples are somewhat simplistic, but you get the idea:
- create a container
- iterate some data, do some branching, etc and fill the container
- return the container
I came up with several decorators that reduce this to:
@collect
def to_power(array, powers):
for entry in array:
power = powers.get(entry)
if power is not None and power >= 0:
yield entry ** power
@composed(dict)
def reverse_mapping(expensive_func, values):
for entry in values:
value = expensive_func(entry)
if value is not None:
yield value, entry
composed(func)
simply applies func
to the result of the decorated function, which effectively gathers the generator. And collect
is just a shorter version of composed(list)
I can create a PR with my implementation, if you are interested in adding it to toolz.
This seems like a really worthwhile ergonomics boon. If I was the maintainer of toolz
I would be leaning hard towards "yes" on this.
Goes well with #486 - if the starcall
function described there is available, then this creates a nicer way of defining functions that return args
or kwargs
for the next function in the composition.
I'm imagining @composed(Arguments.wrap)
on top of @collect
or @composed(dict)
.
Could maybe even enhance the Arguments
in that other issue enable something like @composed(Arguments.build)
on functions that do yield value
, yield 'name', value
, or both.
I decided to add a PR in case you decide to add it to toolz
So... I'm not a big fan.
In my mind, the functional style of programming is about separating the "business" logic from the iteration logic that feeds them data.
These decorators encourage people to stop before achieving such separation.
consider the following:
@curry
def raise(powers, entry):
power = powers.get(entry)
return entry ** power if power is not None and power >= 0 else None
def to_power(powers):
return compose_left(
map(raise(powers))
, remove(identity)
, list
)
def reverse_mapping(expensive_func):
return compose_left(
map(juxt(expensive_func, identity))
, dict
, keyfilter(identity)
)
@ZeroBomb for what it's worth, I find your first alternative harder to understand than either the manual list-appending version or this "collect into list" decorator proposal.
(I had the luck of trying to understand your alternatives from a clean slate, because I totally forgot the examples from the beginning of this issue.)
The biggest reason is that the relevant information gets spread out further apart and I feel myself having to think and remember more across the two functions to re-connect it.
I suggest looking at this proposal more abstractly:
-
composed
is a decorator version ofcompose
+partial
. That can be pretty nice shorthand. Some of the arguments from PEP-318 in favor of decorators apply: some things are clearer if they immediately precede a function rather than being buried below it.Example:
@composed(log_result, prefix="whatever")
on top ofdef foo
instead offoo = compose_left(foo, partial(log_result, prefix="whatever"))
below the whole body offoo
. -
"Collect yielded values into
list
/dict
/whatever" adds ergonomics for building arbitrary collections. A good option in situations where literals, list/dict/set comprehensions, ormap
/filter
start to get awkward. Sometimes there's a lot of coupling between multiple loops; sometimes logic can be more readably expressed in a full function body.
@mentalisttraceur you're reading my mind!
I mostly use these decorators as a cleaner way to write "comprehension-like" code, when I have to chew through some data.
@ZeroBomb
Consider this code (almost verbatim from one of my projects):
@collect
def interpolate_dict(slices, size, default):
for index in range(size):
# find_nearest_keys's contents are not relevant for the example
start, stop = find_nearest_keys(slices, index)
if index in slices:
yield slices[index]
elif start is not None and stop is not None:
try:
# interpolate's contents are not relevant for the example
yield interpolate(slices[start], slices[stop], (index - start) / (stop - start))
except InterpolationError:
yield default
else:
yield []
Here you could do some comprehensions and exceptions rerouting and maybe define a tiny function that calls find_nearest_keys
inside it (because we need a scope for start
and stop
). But would it be more readable? In my experience when you're working with some irregular data it's much cleaner to write a loop, than an one-off-function.
So I will admit to writing decorators like the ones in this proposal in the past, but ultimately I would end up removing them because I was using them in ways that ended up being counter-productive.
The pattern I observed in my own coding would go like this:
- I write my functions simple and focused on the problem at hand (good!)
- Next, I want to use them someplace, but it would be more convenient if they (automatically mapped themselves over dataframes/formatted their results a different way/did some kind of post processing/etc)
- I know! I'll decorate the simple functions so they do that!
- Wait, now I need the functionality of the underlying function, but without the decoration, or with the results formatted a different way...
- I remove the decoration and put the responsibility for pre- or post- processing the function's data where the pre- or post- processing is needed, instead of permanently attaching it to the simple function.
In @maxme1 's example, I would gravitate towards something like this:
@curry
def interpolate_index(slices, index, default=None):
# business code, and nothing but.
# This describes exactly the process for calculating the interpolated value at an index
if index in slices:
return slices[index]
start, stop = find_nearest_keys(slices, index)
if start and stop:
try:
return interpolate(slices[start], slices[stop], (index - start) / (stop - start))
except InterpolationError:
return default
return []
def interpolate_dict(slices, size, default=None):
# This describes the process for interpolating over a range of values
return map(interpolate_index(slices, default=default), range(size))
# More generally
def apply_interpolator(interpolator, size):
# This describes the process for interpolating over a range of values
return map(interpolator, range(size))
There is a sense in which you have to keep two things in mind (the mapping function and the mapped function), but I would argue that:
- Mapping functions over collections of values is a fundamental concept in the functional style of programming, and the toolz library should encourage this style.
- The interpolate_index function is the basic "unit of work" in this algorithm, and setting it apart actually clarifies what you are trying to accomplish.
- Putting the iteration in a separate function gives you the ability to trivially add e.g. a parallelized version of the iterpolate_dict function without touching your business logic at all.
Isn't writing code like this a bit too much of future-proofing?
Yes, in perspective I could restructure my code as you suggested, but if I need the function right now, writing a single function with a decorator is simpler and (arguably) more readable.
But I see your point. You're basically saying that decorators like these are against the philosophy of toolz
, and this is a strong argument
@ZeroBomb @mentalisttraceur maybe you could suggest another package where this functionality will fit?
Thanks for the feedback, everyone!
I didn't think of anything simpler than moving these 2 decorators to a separate library - jboc
(Just a Bunch Of Code).
Pretty dumb, but gets the job done