insitro/redun

[TypeError: Expressions of unknown length cannot be iterated] when trying to iterate on task output

Closed this issue · 2 comments

Hi there,

Thanks for creating this framework, I'm just starting to learn it. Here's an issue I'm often stuck with: when I try to consume the output from a task (e.g.: iterate through the list/dict output), I'm always faced with this error: [TypeError: Expressions of unknown length cannot be iterated].

If I pass the output directly to another task, it works fine. I'm slightly confused since the task double is doing essentially the same thing as the code in main which is also a redun task itself. My guess is that it has something to do with the lazy evaluation. My main concern is that if the code gets more complicated, I either have to encapsulate lots of logic into one giant task, or break the code down into tiny tasks whenever I had to iterate on the previous result. If this is intended, could you point me in the right direction?

Here's a minimal repro:

from redun import task

redun_namespace = "redun.example.test"


@task()
def get_list():
    return [1, 2, 3, 4, 5]


@task()
def double(_list: list):
    return [each * 2 for each in _list]


@task()
def main():
    data = get_list()
    # return double(data)  # this is fine
    return [each * 2 for each in data]  # this would raise

Thank you!

Thanks @lyin-vir for the question.

If I pass the output directly to another task, it works fine.

Yep, this is a common way to approach the issue.

I'm slightly confused since the task double is doing essentially the same thing as the code in main which is also a redun task itself. My guess is that it has something to do with the lazy evaluation.

Your understanding is correct. data in main() is lazy (type TaskExpression) and we don't know how long its going to be. Therefore, an eager statement like the for-loop in the list comprehension will fail.

My main concern is that if the code gets more complicated, I either have to encapsulate lots of logic into one giant task, or break the code down into tiny tasks whenever I had to iterate on the previous result. If this is intended, could you point me in the right direction?

We find it is common to break tasks down into small units so that is a perfectly reasonable way to design a workflow.

One more tip in case it helps, is if you have a case where you just want to just map over the lazy list, you can do that using redun.functools.map_. Here is what it would look like in your example:

from redun import task
from redun.functools import map_

redun_namespace = "redun.example.test"

@task()
def get_list():
    return [1, 2, 3, 4, 5]

@task()
def double(x):
    return 2 * x

@task()
def main():
    data = get_list()
    return map_(double, data)

This has the added benefit of running the doubles in parallel.

I hope this helps.

@mattrasmus Thank you very much for the clarification.