kubeflow/pipelines

[backend]Error after restoring pipelines from velero backup

Opened this issue · 0 comments

Environment

AWS eks (Kubernetes version 1.29)

  • How did you deploy Kubeflow Pipelines (KFP)?
    pip install kfp
  • KFP version:
    build version dev_local
  • KFP SDK version:

kfp 1.8.22
kfp-pipeline-spec 0.1.16
kfp-server-api 1.8.5

Steps to reproduce

After the setup of kubeflow and kubeflow pipelines I have uploaded and ran a sample pipeline.Everything worked fine. After that I have created a backup of the kubeflow namespace and the user namespace with velero(version 5.3.0). Here is a sample command
velero backup create <backup_name> --include-namespaces <namespace_name>
I have tried with an additional command to capture the workflows and scheduled workflows specifically like this
velero backup create <backup_name> --include-namespaces <namespace_name> --include-resources workflow,scheduledworkflow
After that I have deleted first the kubeflow namespace and after that the user namespace.I did it in this order because otherwise kubeflow will recreate the user namespace.
During this operation the pv(used by the kubeflow mysql pod) was retained but it was in state of detached. In order to make it available I used the following command.
kubectl patch pv <pv_name> -p '{"spec":{"claimRef": null}}'
After that I have restored the namespaces using velero first the user namespace after that I have restored the kubeflow namespace. Sample command
velero restore create <restore_name> --from-backup <backup_name> --include-namespaces
I have also tried to include the workflow and scheduled workflow in the command but it didn't worked.
After restoring everything worked without the pipelines.The yaml page of the pipeline was empty after restoring.
And when I have tried to make a run I got this error.

{"error":"Failed to create a new run.: InvalidInputError: unknown template format: pipeline spec is invalid","code":3,"message":"Failed to create a new run.: InvalidInputError: unknown template format: pipeline spec is invalid","details":[{"@type":"type.googleapis.com/api.Error","error_message":"unknown template format","error_details":"Failed to create a new run.: InvalidInputError: unknown template format: pipeline spec is invalid"}]}

In the past I had problem with the memory(the velero didn't had enough memory) but now this issue is resolved.
Also my colleague did a little investigation and this is his findings.
I've created new pipeline and removing corresponding Workflow is causing our issue. I've compared it with malformed pipeline (testlow), but haven't found any meaningful difference.
In DB they look the same:

image

In various Kubeflow Pods logs I don't find any errors while trying to open pipeline YAML in UI.

I've tried to run pipeline from API and error is the same:
Plain Text
ApiException: (400)
Reason: Bad Request
HTTP response headers: HTTPHeaderDict({'x-powered-by': 'Express', 'content-type': 'application/json', 'date': 'Tue, 24 Sep 2024 12:31:46 GMT', 'content-length': '440', 'x-envoy-upstream-service-time': '11', 'server': 'istio-envoy'})
HTTP response body: {"error":"Failed to create a new run.: InvalidInputError: unknown template format: pipeline spec is invalid","code":3,"message":"Failed to create a new run.: InvalidInputError: unknown template format: pipeline spec is invalid","details":[{"@type":"type.googleapis.com/api.Error","error_message":"unknown template format","error_details":"Failed to create a new run.: InvalidInputError: unknown template format: pipeline spec is invalid"}]}

Expected result

The pipelines to have the yaml section and be launched without problems.

Materials and Reference

Any sample pipeline will work as a demo.But here is a simple pipeline that I had used.

import kfp
from kubernetes.client.models.v1_toleration import V1Toleration

toleration = V1Toleration(effect='NoSchedule', key='ComputeResources', value='reservedFor')

def sample_op():
    from time import sleep
    sleep(100000)

    print(123)

sample_comp = kfp.components.func_to_container_op(
    func=sample_op,
    base_image='python:3.10-slim-buster',
    output_component_file='localFunc.yaml'
)
ms = MetadataSpec(annotations = {'test1':'test1'},labels = {'test2':'test2'})
cs = ComponentSpec(metadata=ms)
kfp.components.load_component(filename='localFunc.yaml')
@kfp.dsl.pipeline(
    name='ppln-from-vsc',
    description='A pipeline'
)
def ppln_from_vsc():
    sample_comp().set_memory_request('8000Mi').set_memory_limit('8000Mi').set_cpu_request('50m').set_cpu_limit('100m').add_toleration(toleration).add_pod_label('high-pipeline',"high-pipeline")
print(sample_comp().component_ref)

if __name__ == '__main__':
    kfp.compiler.Compiler().compile(ppln_from_vsc, __file__ + '.yaml')

Impacted by this bug? Give it a 👍.