redun does not execute any code for a simple example
Closed this issue · 4 comments
I'm trying a simple example with redun == 0.8.7
:
from redun import task, File
import pandas as pd
PATH = File("input.csv")
@task
def load_data(path):
return pd.read_csv(path)
@task
def main(path: File = PATH) -> File:
data = load_data(path)
data.to_csv("data.csv")
return File("./data.csv")
where input.csv
is simply
$ cat input.csv
x,
1,
2,
3,
running redun
then gives:
$ redun run cli.py main
[redun] redun :: version 0.8.7
[redun] config dir: .redun
[redun] Upgrading db from version -1.0 to 3.1...
[redun] Tasks will require namespace soon. Either set namespace in the `@task` decorator or with the module-level variable `redun_namespace`.
tasks needing namespace: cli.py:load_data, cli.py:main
[redun] Start Execution f6e622ee-907c-4500-8673-464f8e5a12b6: redun run cli.py main
[redun] Run Job c3c581e5: main(path=File(path=input.csv, hash=98c594bd)) on default
[redun]
[redun] | JOB STATUS 2022/04/28 00:18:12
[redun] | TASK PENDING RUNNING FAILED CACHED DONE TOTAL
[redun] |
[redun] | ALL 0 0 0 0 1 1
[redun] | main 0 0 0 0 1 1
[redun]
[redun] Execution duration: 0.14 seconds
File(path=./data.csv, hash=65b8e975)
But no file is produced and the load_data()
task never runs.
Is this a bug, or am I doing something wrong?
FWIW: modifying the code as below (to properly handle File
, I think?) does not help:
from redun import task, File
import pandas as pd
INPUT = File("input.csv")
@task
def load_data(input: File) -> pd.DataFrame:
return pd.read_csv(input.path) + 1
@task
def main(input: File = INPUT) -> File:
data = load_data(input)
data.to_csv("data.csv")
return File("./data.csv")
Hey @elanmart. If you reformat your code like this, it works:
from redun import task, File
import pandas as pd
INPUT = File("input.csv")
@task
def load_data(input: File) -> File:
df = pd.read_csv(input.path) + 1
df.to_csv("data.csv", index=False)
return File("data.csv")
@task
def main(data: File = INPUT) -> File:
data = load_data(data)
return data
Thanks @ricomnl for the suggestion.
@elanmart Thanks for the question. It's not a bug, but I agree its surprising and likely hard to see why the load_data
didn't run. This is probably a good example of something to highlight better in the docs (perhaps in the FAQ).
If we take your example and add a print
statement, we see that data.to_csv("data.csv")
is a lazy expression.
@task
def main(path: File = PATH) -> File:
data = load_data(path)
x = data.to_csv("data.csv")
print(x)
return File("./data.csv")
which prints:
SimpleExpression('call', (SimpleExpression('getattr', (TaskExpression('load_data', (File(path=input.csv, hash=3c1348a5),), {}), 'to_csv'), {}), ('data.csv',), {}), {})
redun will only evaluate that lazy expression if you "use it", which in this case is to return it from main
. However in your example above, you discard it. It may feel like you "used it" because you do return File("./data.csv")
, but that is a value not directly derived from x
(my example), and so redun doesn't realize it.
This is why @ricomnl change works. Let me know if this explanation helps.
Thanks @ricomnl , @mattrasmus , I think it makes perfect sense now.
I think it would indeed be great if this was included in the docs / FAQ! Or perhaps it would be possible for redun
to show a warning if there are unused outputs in the graph?
FWIW, the way I would update the original sample as follow:
from redun import task, File
import pandas as pd
INPUT = File("input.csv")
@task
def load_data(input: File) -> pd.DataFrame:
return pd.read_csv(input.path) + 1
@task
def save_data(data: pd.DataFrame) -> File:
data.to_csv("data.csv")
return File("data.csv")
@task
def main(input: File = INPUT) -> File:
data = load_data(input)
ret = save_data(data)
return ret