tensorflow/tfx

with_platform_config ignored for ImportExampleGen for KubeflowV2DagRunner

xivarri opened this issue · 4 comments

This commit added support for using with_platform_config with KubeflowV2DagRunner for running on Vertex, but because the logic only runs in _build_container_spec in step_builder.py, it is not run for Importer/FileBasedExampleGen nodes, so adding with_platform_config to e.g. ImportExampleGen is not reflected in the output json.

@xivarri, Can you try using below code on ImportExampleGen or FileBasedExampleGen components to specify CPU and RAM on Vertex as shown in similar issue.

Thank you!

from kfp.pipeline_spec import pipeline_spec_pb2 as pipeline_pb2

my_component = MyComponent().with_platform_config(
        pipeline_pb2.PipelineDeploymentConfig.PipelineContainerSpec
        .ResourceSpec(cpu_limit=2.0, memory_limit=4.0))

Yeah that's what I did - but this does not get reflected in the output json, I had to do this hack from that issue to manually modify the json:

    for component_executor_spec in data["pipelineSpec"]["deploymentSpec"][
        "executors"
    ].values():
        if "resources" not in component_executor_spec["container"]:
            component_executor_spec["container"]["resources"] = {
                "cpuLimit": 32.0,
                "memoryLimit": 128.0,
            }

(I also could not limit this to only the importexamplegen's executor - if I did that vertex would still use e2-standard-4 for it - so set it for all of them)

@xivarri, Thank you for sharing this insights. We will work on this internally and update the thread.

Does kubeflow supports .json file .? from when .?
As per I know it only supports yaml, zip, tar.gz files .

How are you able to run pipeline json files in kubeflow?

@xivarri