eager map & filter?
bijoythomas opened this issue · 6 comments
Hello, I'm new to toolz and am trying out the functions in the curried namespace. The code below
from toolz.curried import *
is_even = lambda n: n % 2 == 0
inc = lambda n: n + 1
compose(
map(inc),
filter(is_even)
)([1,2,3,4])
returns a map object instead of a list (which I was expecting). However,
compose(
groupby(lambda n: "A" if n < 2 else "B"),
map(lambda n: n + 1),
filter(lambda n: n %2 == 0)
)([1,2,3,4])
return a dict with list values as expected instead of a dict with sub-iterators (like itertools.groupby
)
Is there a reason for keeping the curried map
& filter
lazy like the native Python3 functions?
toolz.groupby
and itertools.groupby
are not equivalent functions. 'itertools.groupby
creates a new group every time the key function changes value. This effectively requires the input iterator to be sorted by the key function. toolz.groupby
makes no such assumption. This is the reason why itertools.groupby
is lazy and toolz.groupby
is not.
map
and filter
have always been lazy in toolz. When toolz supported Python 2, map
was an alias for itertools.imap
and filter
was an alias for itertools.ifilter
. In Python 3, they are simply their respective builtin functions.
Good questions @bijoythomas and thanks for the quick, informative reply @groutr. I always like to hear experiences of new users. Since the questions have been answered, can we close this issue?
Btw, we have considered having a non-lazy namespace so one could do things like toolz.eager.map(func, data)
. I'm open to this idea. When teaching, learning, or exploring, it can be helpful to effortlessly see the data instead of a lazy object. One challenge is how to have a curried, eager namespace? Would it be toolz.eager.curried
, toolz.curried.eager
, both, or something else?
Agree. I had these above questions myself. Good to know.
One thing I'd say is that since map and filter's value is not differentiated by this library anymore, then the docs should not show them being imported from the itertoolz library or any library. Seeing from toolz import map
created some confusion while reading the documentation.
@startakovsky there is a difference between the built-in map
and toolz.curried.map
since the second one is, of course, curried.
@eriknw I would suggest keeping everything lazy and just adding a consumer function that enforces eager evaluation. For instance, to eagerly evaluate a map object, you can just build a list
out of it: map(f, it)
is eagerly consumed by list(map(f, it))
. This is usually what I do if wish to retain the computed values. If the values are not important and can be safely discarded, I usually resort to more_itertools.consume
which is a lot faster and doesn't store anything. I think that this is also related to #445 .
This way, there would be no need for an eager namespace. Everything would be lazy by default, and if you want eager evaluation you either build a list
(if you wish to keep the values) or consume
your iterable.
@ruancomelli note that toolz.last
is already basically consume
. (The only difference is that it returns the last value, whereas I'd expect consume to return nothing. Internally the current implementation builds on tail
and thus stores a deque
, but if that O(1) overhead is modest and presents a lower bar of difficulty for being automatically optimized away by the Python implementation compared to the O(n) of list
.)
@eriknw I think the answer to curried.eager
vs eager.curried
is that curried
is a higher-level/more-general operation (curry(f)
makes sense for any f
, even if f
is eager.f
, but eagerness is much more specific to just iteration functions), so it should go first: toolz.curried.eager
.
"Both" might also make sense as a user/developer improvement, but on the other hand by only having one, that's more Pythonic ("there should only be one way to do it"), it kinda teaches the outer-name-scope-should-be-more-general pattern by example, and there's no breaking change in starting with just one and switching to both later if it proves to be a usability problem.