Kubeflow dashboard returns 403 Forbidden

Question

Kubeflow dashboard returns 403 Forbidden

IoanaEmma opened this issue 3 years ago · 6 comments

I have a problem with Kubeflow Dashboard. Until now I could connect to the dashboard without problems, but after a restart of the PC it gives me forbidden when I try to connect from my browser to http://10.64.140.43.nip.io (this is the url received after kubeflow was enabled).

I'm working on Ubuntu 20.04.2 LTS machine where I installed microk8s 1.20.10 following the steps from this link: https://microk8s.io/docs and then enabled kubeflow(v1.2) addon. When I check kubeflow pods I can see that dex-auth pod sometimes is crashing with the following error:
"failed to list custom resource authcodes.dex.coreos.com, attempting to create: Get "https://[10.152.183.1]:443/apis/dex.coreos.com/v1/namespaces/kubeflow/authcodes": dial tcp 10.152.183.1:443: connect: connection refused"

After a few restarts of dex-auth deployment the pod is successfully Running, but still the dashborad is not working. It's not the first time I encounter this issue and the only solution I could find was to reinstall microk8s again.

I also installed: Knative v0.24, Istio for Knative v0.24, KFServing v0.5 for model prediction and from what I can see, after the restart the kubeflow dashboard can be accesed from this url: http://10.64.140.44.nip.io, but it shows me only the main page, when I try to navigate to experiments, runs or pipelines pages this message appears: no healthy upstream.

I am new to kubernetes and kubeflow so any help or any suggestion as to what I might do to solve this problem is welcome. Thanks.

inspection-report.tar.gz

Answer 1 · 2021-10-19T14:24:13.000Z

Sorry for the slow response, I didn't notice this until reading #2665. We are right now in the process of updating our charms to provide Kubeflow v1.4 (released this month) which might help with some of these issues. We will probably have to see if they persist after that update. Are you still encountering the issues listed here? If not, any solutions you found along the way would be helpful

Answer 2 · 2021-10-19T14:46:01.000Z

Reading #2665 more I think yes you still see the issue. I have a few vague guesses but nothing firm yet. Wondering if maybe after a few days a pod restarts or something and the IP used for the dashboard gets messed up (and would require a new juju config dex-auth ... to set it)? Does canonical/dex-auth-operator#17 sound like what you're experiencing?

If you have a deployment up with this issue, for the apps involved (maybe dashboard, profile, dex-auth, and oidc-gatekeeper, and any others you think relevant) could you please post their:

juju debug-log -i APPLICATION --replay (this is also in the inspect tarball, but if we export each app separately we get more history)
juju show-status-log for the same applications

Answer 3 · 2021-10-19T17:16:50.000Z

Forgot to mention that I've seen the no healthy upstream before when trying to access a notebook pod that is not fully up (there's a small bug in kubeflow where the notebook page reports a notebook up when the pod is alive rather than when the pod is actually serving the notebook, so for a few seconds you can click connect when it isn't actually ready). That makes me think maybe for the screenshot the pod for the KFP experiments is down?

Answer 4 · 2021-10-20T06:28:28.000Z

I don't have anymore a deployment up with this issue, like I said in #2665, all pods are down and microk8s stopped running. I will try to reproduce the problem and come back with the logs. In the meantime this issue canonical/microk8s.io#446 has the inspect tarball (generated after pods started crashing) attached, so maybe you can take a look there.

Also, do you know why first the dashboard could be accessed from http://10.64.140.43.nip.io and then changed to http://10.64.140.44.nip.io? Is this behavior normal?

Answer 5 · 2021-10-20T13:20:06.000Z

I managed to reproduce the issue, here are the logs from dex-auth and oidc-gatekeeper.

dex-auth.txt
oidc-gatekeeper.txt

Answer 6 · 2022-11-22T13:57:04.000Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.