flyteorg/flyte

[Housekeeping] Task resolver is not aware of the module when executed as top-level code environment.

YmirKhang opened this issue · 3 comments

Describe the issue

Trying to use a self contained file to register the tasks results in following error. If the file is executed with the top-level environment there needs to be a way to make the workflows and tasks aware of the module they are defined in:

grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
        status = StatusCode.INTERNAL
        details = "failed to compile workflow for [resource_type:WORKFLOW project:"flytesnacks" domain:"development" name:"sparkapp.workflows.example.my_spark" version:"9be88974-3006-45ef-99c6-3d0df6b83376" ] with err missing entity of type TASK with identifier project:"flytesnacks" domain:"development" name:"__main__.print_every_time" version:"9be88974-3006-45ef-99c6-3d0df6b83376" "
        debug_error_string = "{"created":"@1636542049.376253325","description":"Error received from peer ipv4:127.0.0.1:56651","file":"src/core/lib/surface/call.cc","file_line":1070,"grpc_message":"failed to compile workflow for [resource_type:WORKFLOW project:"flytesnacks" domain:"development" name:"sparkapp.workflows.example.my_spark" version:"9be88974-3006-45ef-99c6-3d0df6b83376" ] with err missing entity of type TASK with identifier project:"flytesnacks" domain:"development" name:"__main__.print_every_time" version:"9be88974-3006-45ef-99c6-3d0df6b83376" ","grpc_status":13}"

Further discussion about this issue can be found here

What if we do not do this?

No effect in standard workflows for now. But we have proposed such a solution for defining and registering the workflows, tasks in the same file. The following solution can be implemented with a module aware remote or resolver object:

@task
def task1():
   pass

@task
def task2():
    pass

@workflow
def my_wf():
    pass


@click.option(...)
...
package_and_register():
    # build images

    with tmp_proxy() ...:
        
        for item in [task1, task2, workflow, launchplan]:
            remote.register(item)

if __name__ == __main__():
    package_and_register()

Related component(s)

flytekit.remote, fllytekit.core

Are you sure this issue hasn't been raised already?

  • Yes

Have you read the Code of Conduct?

  • Yes

@YmirKhang thabk you for the issue, are you planning on helping out with this?

cc @eapolinario / @YmirKhang the new module resolution PR i put in, does that help this in any way? Sorry i do not follow the exact problem

As of flytekit v0.32.0, flytekit takes care of locating entities with the module name (and exclude prefixes like __main__), which is essentially what's needed to implement the idea in this issue.