OpenUnison/openunison-k8s

Orechestra pods started giving error " ERROR K8sSessionStore - Could not search k8s" and url started giving Tremelo Error

shnigam2 opened this issue · 12 comments

Error Details which we observed in orchestra pod logs are as below :


[2023-06-30 06:03:01,087][Thread-8] ERROR K8sWatcher - Could not run watch, waiting 10 seconds
java.lang.NullPointerException: null
[2023-06-30 06:03:01,091][Thread-15] WARN  OpenShiftTarget - Unexpected result calling 'https://172.20.0.1:443/apis/openunison.tremolo.io/v1/namespaces/openunison/oidc-sessions/x3eca7c10-9267-4dba-b15f-7feca5cd6b28x' - 401 / {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Unauthorized","reason":"Unauthorized","code":401}

[2023-06-30 06:03:01,104][Thread-15] WARN  OpenShiftTarget - Unexpected result calling 'https://172.20.0.1:443/apis/openunison.tremolo.io/v1/namespaces/openunison/oidc-sessions/xbcd3b484-a5ef-4c24-bc1f-2c4a9fd24123x' - 401 / {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Unauthorized","reason":"Unauthorized","code":401}

[2023-06-30 06:03:01,113][Thread-15] WARN  OpenShiftTarget - Unexpected result calling 'https://172.20.0.1:443/apis/openunison.tremolo.io/v1/namespaces/openunison/oidc-sessions/xa821d927-c905-4691-9a33-ba69b300edb2x' - 401 / {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Unauthorized","reason":"Unauthorized","code":401}

[2023-06-30 06:03:01,122][Thread-15] WARN  OpenShiftTarget - Unexpected result calling 'https://172.20.0.1:443/apis/openunison.tremolo.io/v1/namespaces/openunison/oidc-sessions/xcdb7de08-d2ed-4a54-b00d-e2e80fdde73fx' - 401 / {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Unauthorized","reason":"Unauthorized","code":401}

[2023-06-30 06:03:01,135][Thread-15] WARN  OpenShiftTarget - Unexpected result calling 'https://172.20.0.1:443/apis/openunison.tremolo.io/v1/namespaces/openunison/oidc-sessions/x42afb20e-a193-4a40-90ab-1b586cd220bfx' - 401 / {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Unauthorized","reason":"Unauthorized","code":401}

[2023-06-30 06:03:01,148][Thread-15] WARN  OpenShiftTarget - Unexpected result calling 'https://172.20.0.1:443/apis/openunison.tremolo.io/v1/namespaces/openunison/oidc-sessions/x33b35ec8-7dd5-4ca6-b80e-213c610f9034x' - 401 / {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Unauthorized","reason":"Unauthorized","code":401}

[2023-06-30 06:03:01,160][Thread-15] WARN  OpenShiftTarget - Unexpected result calling 'https://172.20.0.1:443/apis/openunison.tremolo.io/v1/namespaces/openunison/oidc-sessions/x782dab93-41c8-4f94-a9ce-61c4a4062a55x' - 401 / {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Unauthorized","reason":"Unauthorized","code":401}

[2023-06-30 06:03:01,172][Thread-15] WARN  OpenShiftTarget - Unexpected result calling 'https://172.20.0.1:443/apis/openunison.tremolo.io/v1/namespaces/openunison/oidc-sessions/x1ae33343-476c-4077-927f-c8a2151f56b7x' - 401 / {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Unauthorized","reason":"Unauthorized","code":401}

[2023-06-30 06:03:01,173][Thread-15] WARN  SessionManagerImpl - Clearing 7 sessions
[2023-06-30 06:03:01,960][Thread-9] INFO  K8sWatcher - watching https://172.20.0.1:443/apis/openunison.tremolo.io/v1/namespaces/openunison/orgs?watch=true&timeoutSecond=25
[2023-06-30 06:03:01,972][Thread-9] ERROR K8sWatcher - Could not run watch, waiting 10 seconds
java.lang.NullPointerException: null
[2023-06-30 06:03:03,162][XNIO-1 task-4] INFO  AccessLog - [AzSuccess] - CheckAlive - https://127.0.0.1:8443/check_alive - uid=Anonymous,o=Tremolo - 
         [127.0.0.1] - [f7b58659eceeaa2589cc99c52a5aefe5417d809fa]
[2023-06-30 06:03:03,178][XNIO-1 task-4] INFO  AccessLog - [AzSuccess] - k8sIdp - https://127.0.0.1:8443/auth/idp/k8sIdp/.well-known/openid-configuration - uid=Anonymous,o=Tremolo - NONE [127.0.0.1] - [f9d91ca5e7b85abf8b18de00820e76cbe0023929b]
[2023-06-30 06:03:04,037][Thread-18] INFO  K8sWatcher - watching https://172.20.0.1:443/apis/openunison.tremolo.io/v1/namespaces/openunison/trusts?watch=true&timeoutSecond=25
[2023-06-30 06:03:04,063][Thread-18] ERROR K8sWatcher - Could not run watch, waiting 10 seconds
java.lang.NullPointerException: null
[2023-06-30 06:03:04,175][Thread-14] INFO  K8sWatcher - watching https://172.20.0.1:443/apis/openunison.tremolo.io/v1/namespaces/openunison/authmechs?watch=true&timeoutSecond=25
[2023-06-30 06:03:04,188][Thread-14] ERROR K8sWatcher - Could not run watch, waiting 10 seconds
java.lang.NullPointerException: null

Tremelo Error screenshot attached ,which we are getting when these errors are started coming and for fixing it we just restart openunison object by doing dummy annotation. Looking for the cause and how can we permanently fix it
Screenshot 2023-07-27 at 2 20 37 PM

@mlbiam Hi Marc, Could you please have a view on this.. This is getting frequent now a days..

mlbiam commented

Please provide:

  1. OpenUnison version - You can get this from the beginning of the logs
  2. Kubernetes version and platform (ie eks, kubeadm, etc)
  3. Your values.yaml
  4. Installation method

@mlbiam Please find the main error observed on orchestra pods logs :-

[2023-10-13 22:08:00,011][local_Worker-3] ERROR K8sSessionStore - Could not search k8s
java.lang.NullPointerException: null
	at com.tremolosecurity.oidc.k8s.K8sSessionStore.cleanOldSessions(K8sSessionStore.java:284) [unison-applications-k8s-1.0.24.jar:?]
	at com.tremolosecurity.idp.providers.OpenIDConnectIdP.clearExpiredSessions(OpenIDConnectIdP.java:2215) [unison-idp-openidconnect-1.0.24.jar:?]
	at com.tremolosecurity.idp.providers.oidc.model.jobs.ClearSessions.execute(ClearSessions.java:47) [unison-idp-openidconnect-1.0.24.jar:?]
	at com.tremolosecurity.provisioning.scheduler.UnisonJob.execute(UnisonJob.java:57) [unison-sdk-1.0.24.jar:?]
	at org.quartz.core.JobRunShell.run(JobRunShell.java:202) [quartz-2.3.2.jar:?]
	at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:573) [quartz-2.3.2.jar:?]
[2023-10-13 22:08:00,011][local_Worker-3] ERROR OpenIDConnectIdP - Could not clear sessions
java.lang.Exception: Error searching kubernetes
	at com.tremolosecurity.oidc.k8s.K8sSessionStore.cleanOldSessions(K8sSessionStore.java:302) ~[unison-applications-k8s-1.0.24.jar:?]
	at com.tremolosecurity.idp.providers.OpenIDConnectIdP.clearExpiredSessions(OpenIDConnectIdP.java:2215) [unison-idp-openidconnect-1.0.24.jar:?]
	at com.tremolosecurity.idp.providers.oidc.model.jobs.ClearSessions.execute(ClearSessions.java:47) [unison-idp-openidconnect-1.0.24.jar:?]
	at com.tremolosecurity.provisioning.scheduler.UnisonJob.execute(UnisonJob.java:57) [unison-sdk-1.0.24.jar:?]
	at org.quartz.core.JobRunShell.run(JobRunShell.java:202) [quartz-2.3.2.jar:?]
	at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:573) [quartz-2.3.2.jar:?]
Caused by: java.lang.NullPointerException
	at com.tremolosecurity.oidc.k8s.K8sSessionStore.cleanOldSessions(K8sSessionStore.java:284) ~[unison-applications-k8s-1.0.24.jar:?]
	... 5 more
[2023-10-13 22:08:01,237][Thread-11] INFO  K8sWatcher - watching https://172.20.0.1:443/apis/openunison.tremolo.io/v1/namespaces/openunison/resultgroups?watch=true&timeoutSecond=25
[2023-10-13 22:08:01,252][Thread-11] ERROR K8sWatcher - Could not run watch, waiting 10 seconds
java.lang.NullPointerException: null

OpenUnison version

k logs openunison-orchestra-d7d9dc9fd-c7dgb   -n openunison|grep -i version
[2023-10-14 01:26:32,137][main] INFO  xnio - XNIO version 3.8.4.Final
[2023-10-14 01:26:32,246][main] INFO  nio - XNIO NIO Implementation Version 3.8.4.Final
  Version: V3
[2023-10-14 01:26:44,747][main] INFO  StdSchedulerFactory - Quartz scheduler version: 2.3.2
[2023-10-14 01:26:57,641][main] INFO  threads - JBoss Threads version 2.3.3.Final

Kubernetes version and platform (ie eks, kubeadm, etc) - Kubeadm & EKS 1.24.10
Your values.yaml:-

     source:
        repoURL: https://nexus.tremolo.io/repository/helm
        targetRevision: 2.3.34
        chart: orchestra-login-portal-argocd
        helm:
          releaseName: openunison
          values: |
            image: our-repo-cngccp-docker-k8s.jfrog.io/openunison-k8s:sdfedgtjhkrghkghdft
 
            operator:
              image: our-repo-cngccp-docker-k8s.jfrog.io/openunison-kubernetes-operator:dfghktyirtyritg
              validators: []
              mutators: []
 
            network:
              openunison_host: "login-{{metadata.annotations.name}}-{{metadata.annotations.environment}}-{{metadata.annotations.region}}-aws.cf.platform.our-repo.cloud"
              dashboard_host: "dashboard-{{metadata.annotations.name}}-{{metadata.annotations.environment}}-{{metadata.annotations.region}}-aws.cf.platform.our-repo.cloud"
              api_server_host: "ou-api-{{metadata.annotations.name}}-{{metadata.annotations.environment}}-{{metadata.annotations.region}}-aws.cf.platform.our-repo.cloud"
              session_inactivity_timeout_seconds: 36000
              k8s_url: ''
              force_redirect_to_tls: false
              createIngressCertificate: false
              ingress_type: none
              ingress_certificate: ou-tls-main-certificate
              ingress_annotations:
                certmanager.k8s.io/cluster-issuer: letsencrypt
                kubernetes.io/ingress.class: nginx
 
            cert_template:
              ou: "Kubernetes"
              o: "MyOrg"
              l: "My Cluster"
              st: "State of Cluster"
              c: "MyCountry"
 
            myvd_config_path: "WEB-INF/myvd.conf"
            k8s_cluster_name: "{{metadata.annotations.name}}-{{metadata.annotations.environment}}-{{metadata.annotations.region}}-aws.cf.platform.our-repo.cloud"
            enable_impersonation: true
            cert_update_image: our-repo-cngccp-docker-k8s.jfrog.io/openunison-kubernetes-operator:dfghktyirtyritg
 
            impersonation:
              jetstack_oidc_proxy_image: our-repo-cngccp-docker-k8s.jfrog.io/kube-oidc-proxy:eretrtyi45864856834e
              use_jetstack: true
              explicit_certificate_trust: true
              ca_secret_name: ou-tls-certificate
 
            dashboard:
              namespace: "kubernetes-dashboard"
              cert_name: "kubernetes-dashboard-certs"
              label: "k8s-app=kubernetes-dashboard"
              service_name: kubernetes-dashboard
              require_session: true
 
            certs:
              use_k8s_cm: false
 
            trusted_certs: []
 
            monitoring:
              prometheus_service_account: system:serviceaccount:monitoring:prometheus-k8s
 
            oidc:
              client_id: "{{metadata.annotations.oidc_client_id}}"
              issuer: https://e52416c3-mckid-us.okta.com
              user_in_idtoken: false
              domain: ""
              scopes: openid email profile groups
              claims:
                sub: sub
                email: email
                given_name: given_name
                family_name: family_name
                display_name: name
                groups: groups
 
            network_policies:
              enabled: false
              ingress:
                enabled: false
              monitoring:
                enabled: false
              apiserver:
                enabled: false
 
            services:
              pullSecret: "jfrog-auth"
              enable_tokenrequest: false
              token_request_audience: api
              token_request_expiration_seconds: 14400
              node_selectors: []
              resources:
                limits:
                  cpu: 500m
                  memory: 2050Mi
                requests:
                  cpu: 200m
                  memory: 1024Mi
 
            openunison:
              replicas: 2
              non_secret_data:
                K8S_DB_SSO: oidc
                PROMETHEUS_SERVICE_ACCOUNT: system:serviceaccount:monitoring:prometheus-k8s
              secrets: []
              html:
                image: our-repo-cngccp-docker-k8s.jfrog.io/openunison-k8s-html:d38662bde4ea41efab695a41cb4fc9766a39ece99f9ea4fed2a9ffac7670c0a2
                prefix: openunison
              enable_provisioning: false
source:
        repoURL: https://nexus.tremolo.io/repository/helm/
        targetRevision: 1.0.24
        chart: openunison-k8s-login-oidc
        helm:
          releaseName: orchestra
          values: |
            deployment_data:
              pull_secret: jfrog-auth
            enable_impersonation: true
            image: "our-repo-cngccp-docker-k8s.jfrog.io/openunison-k8s-login-oidc:sderfdserfd"
            impersonation:
              ca_secret_name: ou-tls-certificate
              explicit_certificate_trust: true
              jetstack_oidc_proxy_image: our-repo-cngccp-docker-k8s.jfrog.io/kube-oidc-proxy:swedfrty
              use_jetstack: true
            k8s_cluster_name: "{{metadata.annotations.name}}-{{metadata.annotations.environment}}-{{metadata.annotations.region}}-aws.cf.platform.our-repo.cloud"
            myvd_configmap: ''
            network:
              api_server_host: "ou-api-{{metadata.annotations.name}}-{{metadata.annotations.environment}}-{{metadata.annotations.region}}-aws.cf.platform.our-repo.cloud"
              createIngressCertificate: false
              dashboard_host: "dashboard-{{metadata.annotations.name}}-{{metadata.annotations.environment}}-{{metadata.annotations.region}}-aws.cf.platform.our-repo.cloud"
              ingress_annotations:
                certmanager.k8s.io/cluster-issuer: letsencrypt
                kubernetes.io/ingress.class: openunison
              k8s_url: ''
              openunison_host: "login-{{metadata.annotations.name}}-{{metadata.annotations.environment}}-{{metadata.annotations.region}}-aws.cf.platform.our-repo.cloud"
              session_inactivity_timeout_seconds: 36000
              ingress_type: none
            oidc:
              auth_url: https://ourorg-orgid-us.okta.com/oauth2/v1/authorize
              client_id: "{{metadata.annotations.oidc_client_id}}"
              token_url: https://ourorg-orgid-us.okta.com/oauth2/v1/token
              user_in_idtoken: false
              userinfo_url: https://ourorg-orgid-us.okta.com/oauth2/v1/userinfo
            openunison:
              replicas: 2
            services:
              pullSecret: jfrog-auth
              resources:
                limits:
                  cpu: 500m
                  memory: 2050Mi
                requests:
                  cpu: 200m
                  memory: 1024Mi
              token_request_expiration_seconds: 14400
            trusted_certs: []

Installation method - Helm

@mlbiam Could you please help us on this issue, we are getting such issues frequently on our clusters which is causing unnecessary user noice..

mlbiam commented

I need the version of openunison. If you're not using one of our versioned images, please get it from the first line of the logs. It will look like OpenUnisonOnUndertow - Starting OpenUnison on Undertow 1.0.38-2023072501

The stack trace looks like its associated with an old openunison image. Can you also give the original source images labels? The refrences to your internal jfrog doesn't really give me any information.

Thanks

@mlbiam It is as below :-

OpenUnisonOnUndertow - Starting OpenUnison on Undertow 1.0.24-2021110502
mlbiam commented

That's almost two years old. I doubt it would work with the 2.3.34 orchestra-login-portal-argocd helm chart. Based on the error message I think what's happening is that the token in the container is expiring. It works on restart because you're getting a new token.

Since you're already using a modern chart, I'd suggest using the 1.0.37 version of the container - ghcr.io/openunison/openunison-k8s:1.0.37

Hi @mlbiam there is one correction We are still using old helm and working on testing the upgraded helm charts. Can you please check below and check if we can fix this issue on existing version which we are using

We are using mutiple Application objects for openunison and orchestra. Below are the ArgoCD objects with helm values.

Openunison

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  annotations:
    argocd.argoproj.io/sync-wave: '-2'
  name: openunison
  namespace: argocd
spec:
  destination:
    namespace: openunison
    server: 'https://xxxxxxxx'
  project: cnt
  source:
    chart: openunison-operator
    helm:
      releaseName: openunison
      values: |-
        {
          "image": "xxxxxxxx/openunison-k8s-operator:xxxxxxxx",
          "services": {
            "pullSecret": "jfrog-auth"
          }
        }
    repoURL: 'https://nexus.tremolo.io/repository/helm/'
    targetRevision: 2.0.6
  syncPolicy:
    automated:
      prune: true
    syncOptions:
      - ApplyOutOfSyncOnly=true

Orchestra

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: orchestra
  namespace: argocd
spec:
  destination:
    namespace: openunison
    server: 'https://xxxxxxxx'
  project: cnt
  source:
    chart: openunison-k8s-login-oidc
    helm:
      releaseName: orchestra
      values: |-
        {
          "cert_template": {
            "c": "xxxxxxxx",
            "l": "xxxxxxxx",
            "o": "dev",
            "ou": "xxxxxxxx",
            "st": "xxxxxxxx"
          },
          "deployment_data": {
            "pull_secret": "jfrog-auth"
          },
          "enable_impersonation": true,
          "image": "xxxxxxxx/openunison-k8s-login-oidc:xxxxxxxx",
          "impersonation": {
            "ca_secret_name": "xxxxxxxx",
            "explicit_certificate_trust": true,
            "jetstack_oidc_proxy_image": "xxxxxxxx/kube-oidc-proxy:xxxxxxxx",
            "oidc_tls_secret_name": "tls-certificate",
            "use_jetstack": true
          },
          "k8s_cluster_name": "xxxxxxxx",
          "myvd_configmap": "",
          "network": {
            "api_server_host": "dev-ou-api.com",
            "createIngressCertificate": false,
            "dashboard_host": "dev-dashboard.com",
            "ingress_annotations": {
              "certmanager.k8s.io/cluster-issuer": "letsencrypt",
              "kubernetes.io/ingress.class": "openunison"
            },
            "ingress_certificate": "",
            "ingress_type": "none",
            "k8s_url": "",
            "openunison_host": "dev-login.com",
            "session_inactivity_timeout_seconds": xxxxxxxx
          },
          "oidc": {
            "auth_url": "https://xxxxxxxx",
            "client_id": "xxxxxxxx",
            "token_url": "https://xxxxxxxx",
            "user_in_idtoken": xxxxxxxx,
            "userinfo_url": "https://xxxxxxxx"
          },
          "openunison": {
            "replicas": 2
          },
          "services": {
            "pullSecret": "jfrog-auth",
            "resources": {
              "limits": {
                "cpu": "500m",
                "memory": "2048Mi"
              },
              "requests": {
                "cpu": "200m",
                "memory": "1024Mi"
              }
            },
            "token_request_expiration_seconds": xxxxxxxx
          },
          "trusted_certs": [
            {
              "name": "xxxxxxxx",
              "pem_b64": "xxxxxxxx"
            }
          ]
        }
    repoURL: 'https://nexus.tremolo.io/repository/helm/'
    targetRevision: 1.0.24
  syncPolicy:
    automated:
      prune: true
    syncOptions:
      - ApplyOutOfSyncOnly=true

Could you pls take a look at this and share if we can fix this in existing version without upgrading whole helm chart?

Version of Openunison

OpenUnisonOnUndertow - Starting OpenUnison on Undertow 1.0.24-2021110502
mlbiam commented

Could you pls take a look at this and share if we can fix this in existing version without upgrading whole helm chart?

There's no fix for a version that's two years old or charts that are end-of-life. The version of OpenUnison you're using contains a static configuration that is embedded into the container, so there's nothing that can be fixed via helm. The new version is configured via CRDs which provides much more flexibility. Back-porting a fix is a complex process that requires a tremendous amount of QA to reproduce and validate. It's only something we'll do for customers with commercial support contracts.

mlbiam commented

closing due to inactivity