kubeflow/testing

Kubernetes Version Upgrade

PatrickXYS opened this issue · 7 comments

For ingress k8s object, we use NetworkingV1betaAPI to check status here

net_client_apps = k8s_client.NetworkingV1beta1Api(api_client)

But this is supported in k8s python-client v10.0.0.a1, our k8s client is v9.0 here

    # See https://github.com/kubeflow/gcp-blueprints/issues/52#issuecomment-645446088
    # our libs seem to break with 11.0.0
    kubernetes==9.0.0 \

Looks like @jlewi found some issue with v11.0.0 before, I think we need to think about upgrading k8s-client version, or simply disable ALB ingress health check for now.

    ingress_names = ["istio-ingress"]
    # Check if Ingress is Ready and Healthy
    if platform in ["aws"]:
        for ingress_name in ingress_names:
            logging.info("Verifying that ingress %s started...", ingress_name)
            util.wait_for_ingress(api_client, ingress_namespace, ingress_name, 10)

@PatrickXYS Any concerns to upgrade k8s python client? It would be great to pick right client version according to CI kubernetes version. test container has not been upgraded for long time. I would suggest to do an upgrade to fix the issue.

jlewi commented

It might be worth seeing if we can get rid of run_e2e_workflow.py all together.
Prow is adding:

  • native support for Tekton.
  • Defining Jobs inside the repo itself.

It would be great to rely on that and get rid of run_e2e_workflow.py

I would also suggest looking for ways to replace python with go; it will be more maintanable long term.

We have two CI clusters

  • kubeflow-testing
    • This is where the prow pods run
  • kf-ci-v1
    • This is where Tekton runs

We created kf-ci-v1 because kubeflow-testing was too old to install Tekton. I'm not sure why we didn't just upgrade kubeflow-testing but at some point in the future we might want to consider consolidating down to a single cluster. As part of that we might want to switch to a regional cluster for higher availability.

@Jeffwan It makes sense, I'll go ahead to upgrade and build new image and run tests there.

Prow is adding:
native support for Tekton.
Defining Jobs inside the repo itself.
It would be great to rely on that and get rid of run_e2e_workflow.py

That would be great, so I'll keep an eye on Prow's feature support and help on migration.

I would also suggest looking for ways to replace python with go

I think we'll benefit in long-term. But this do need some discussions with community, because some of the repos are relying on our python script.

we might want to consider consolidating down to a single cluster

If prow in the future will add native support for tekton, I'm happy to do it in one single cluster afterwards.

@jlewi

/close

@PatrickXYS: Closing this issue.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.