Orechestra pods started giving error " ERROR K8sSessionStore - Could not search k8s" and url started giving Tremelo Error
shnigam2 opened this issue · 12 comments
Error Details which we observed in orchestra pod logs are as below :
[2023-06-30 06:03:01,087][Thread-8] ERROR K8sWatcher - Could not run watch, waiting 10 seconds
java.lang.NullPointerException: null
[2023-06-30 06:03:01,091][Thread-15] WARN OpenShiftTarget - Unexpected result calling 'https://172.20.0.1:443/apis/openunison.tremolo.io/v1/namespaces/openunison/oidc-sessions/x3eca7c10-9267-4dba-b15f-7feca5cd6b28x' - 401 / {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Unauthorized","reason":"Unauthorized","code":401}
[2023-06-30 06:03:01,104][Thread-15] WARN OpenShiftTarget - Unexpected result calling 'https://172.20.0.1:443/apis/openunison.tremolo.io/v1/namespaces/openunison/oidc-sessions/xbcd3b484-a5ef-4c24-bc1f-2c4a9fd24123x' - 401 / {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Unauthorized","reason":"Unauthorized","code":401}
[2023-06-30 06:03:01,113][Thread-15] WARN OpenShiftTarget - Unexpected result calling 'https://172.20.0.1:443/apis/openunison.tremolo.io/v1/namespaces/openunison/oidc-sessions/xa821d927-c905-4691-9a33-ba69b300edb2x' - 401 / {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Unauthorized","reason":"Unauthorized","code":401}
[2023-06-30 06:03:01,122][Thread-15] WARN OpenShiftTarget - Unexpected result calling 'https://172.20.0.1:443/apis/openunison.tremolo.io/v1/namespaces/openunison/oidc-sessions/xcdb7de08-d2ed-4a54-b00d-e2e80fdde73fx' - 401 / {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Unauthorized","reason":"Unauthorized","code":401}
[2023-06-30 06:03:01,135][Thread-15] WARN OpenShiftTarget - Unexpected result calling 'https://172.20.0.1:443/apis/openunison.tremolo.io/v1/namespaces/openunison/oidc-sessions/x42afb20e-a193-4a40-90ab-1b586cd220bfx' - 401 / {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Unauthorized","reason":"Unauthorized","code":401}
[2023-06-30 06:03:01,148][Thread-15] WARN OpenShiftTarget - Unexpected result calling 'https://172.20.0.1:443/apis/openunison.tremolo.io/v1/namespaces/openunison/oidc-sessions/x33b35ec8-7dd5-4ca6-b80e-213c610f9034x' - 401 / {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Unauthorized","reason":"Unauthorized","code":401}
[2023-06-30 06:03:01,160][Thread-15] WARN OpenShiftTarget - Unexpected result calling 'https://172.20.0.1:443/apis/openunison.tremolo.io/v1/namespaces/openunison/oidc-sessions/x782dab93-41c8-4f94-a9ce-61c4a4062a55x' - 401 / {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Unauthorized","reason":"Unauthorized","code":401}
[2023-06-30 06:03:01,172][Thread-15] WARN OpenShiftTarget - Unexpected result calling 'https://172.20.0.1:443/apis/openunison.tremolo.io/v1/namespaces/openunison/oidc-sessions/x1ae33343-476c-4077-927f-c8a2151f56b7x' - 401 / {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Unauthorized","reason":"Unauthorized","code":401}
[2023-06-30 06:03:01,173][Thread-15] WARN SessionManagerImpl - Clearing 7 sessions
[2023-06-30 06:03:01,960][Thread-9] INFO K8sWatcher - watching https://172.20.0.1:443/apis/openunison.tremolo.io/v1/namespaces/openunison/orgs?watch=true&timeoutSecond=25
[2023-06-30 06:03:01,972][Thread-9] ERROR K8sWatcher - Could not run watch, waiting 10 seconds
java.lang.NullPointerException: null
[2023-06-30 06:03:03,162][XNIO-1 task-4] INFO AccessLog - [AzSuccess] - CheckAlive - https://127.0.0.1:8443/check_alive - uid=Anonymous,o=Tremolo -
[127.0.0.1] - [f7b58659eceeaa2589cc99c52a5aefe5417d809fa]
[2023-06-30 06:03:03,178][XNIO-1 task-4] INFO AccessLog - [AzSuccess] - k8sIdp - https://127.0.0.1:8443/auth/idp/k8sIdp/.well-known/openid-configuration - uid=Anonymous,o=Tremolo - NONE [127.0.0.1] - [f9d91ca5e7b85abf8b18de00820e76cbe0023929b]
[2023-06-30 06:03:04,037][Thread-18] INFO K8sWatcher - watching https://172.20.0.1:443/apis/openunison.tremolo.io/v1/namespaces/openunison/trusts?watch=true&timeoutSecond=25
[2023-06-30 06:03:04,063][Thread-18] ERROR K8sWatcher - Could not run watch, waiting 10 seconds
java.lang.NullPointerException: null
[2023-06-30 06:03:04,175][Thread-14] INFO K8sWatcher - watching https://172.20.0.1:443/apis/openunison.tremolo.io/v1/namespaces/openunison/authmechs?watch=true&timeoutSecond=25
[2023-06-30 06:03:04,188][Thread-14] ERROR K8sWatcher - Could not run watch, waiting 10 seconds
java.lang.NullPointerException: null
@mlbiam Hi Marc, Could you please have a view on this.. This is getting frequent now a days..
Please provide:
- OpenUnison version - You can get this from the beginning of the logs
- Kubernetes version and platform (ie eks, kubeadm, etc)
- Your values.yaml
- Installation method
@mlbiam Please find the main error observed on orchestra pods logs :-
[2023-10-13 22:08:00,011][local_Worker-3] ERROR K8sSessionStore - Could not search k8s
java.lang.NullPointerException: null
at com.tremolosecurity.oidc.k8s.K8sSessionStore.cleanOldSessions(K8sSessionStore.java:284) [unison-applications-k8s-1.0.24.jar:?]
at com.tremolosecurity.idp.providers.OpenIDConnectIdP.clearExpiredSessions(OpenIDConnectIdP.java:2215) [unison-idp-openidconnect-1.0.24.jar:?]
at com.tremolosecurity.idp.providers.oidc.model.jobs.ClearSessions.execute(ClearSessions.java:47) [unison-idp-openidconnect-1.0.24.jar:?]
at com.tremolosecurity.provisioning.scheduler.UnisonJob.execute(UnisonJob.java:57) [unison-sdk-1.0.24.jar:?]
at org.quartz.core.JobRunShell.run(JobRunShell.java:202) [quartz-2.3.2.jar:?]
at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:573) [quartz-2.3.2.jar:?]
[2023-10-13 22:08:00,011][local_Worker-3] ERROR OpenIDConnectIdP - Could not clear sessions
java.lang.Exception: Error searching kubernetes
at com.tremolosecurity.oidc.k8s.K8sSessionStore.cleanOldSessions(K8sSessionStore.java:302) ~[unison-applications-k8s-1.0.24.jar:?]
at com.tremolosecurity.idp.providers.OpenIDConnectIdP.clearExpiredSessions(OpenIDConnectIdP.java:2215) [unison-idp-openidconnect-1.0.24.jar:?]
at com.tremolosecurity.idp.providers.oidc.model.jobs.ClearSessions.execute(ClearSessions.java:47) [unison-idp-openidconnect-1.0.24.jar:?]
at com.tremolosecurity.provisioning.scheduler.UnisonJob.execute(UnisonJob.java:57) [unison-sdk-1.0.24.jar:?]
at org.quartz.core.JobRunShell.run(JobRunShell.java:202) [quartz-2.3.2.jar:?]
at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:573) [quartz-2.3.2.jar:?]
Caused by: java.lang.NullPointerException
at com.tremolosecurity.oidc.k8s.K8sSessionStore.cleanOldSessions(K8sSessionStore.java:284) ~[unison-applications-k8s-1.0.24.jar:?]
... 5 more
[2023-10-13 22:08:01,237][Thread-11] INFO K8sWatcher - watching https://172.20.0.1:443/apis/openunison.tremolo.io/v1/namespaces/openunison/resultgroups?watch=true&timeoutSecond=25
[2023-10-13 22:08:01,252][Thread-11] ERROR K8sWatcher - Could not run watch, waiting 10 seconds
java.lang.NullPointerException: null
OpenUnison version
k logs openunison-orchestra-d7d9dc9fd-c7dgb -n openunison|grep -i version
[2023-10-14 01:26:32,137][main] INFO xnio - XNIO version 3.8.4.Final
[2023-10-14 01:26:32,246][main] INFO nio - XNIO NIO Implementation Version 3.8.4.Final
Version: V3
[2023-10-14 01:26:44,747][main] INFO StdSchedulerFactory - Quartz scheduler version: 2.3.2
[2023-10-14 01:26:57,641][main] INFO threads - JBoss Threads version 2.3.3.Final
Kubernetes version and platform (ie eks, kubeadm, etc) - Kubeadm & EKS 1.24.10
Your values.yaml:-
source:
repoURL: https://nexus.tremolo.io/repository/helm
targetRevision: 2.3.34
chart: orchestra-login-portal-argocd
helm:
releaseName: openunison
values: |
image: our-repo-cngccp-docker-k8s.jfrog.io/openunison-k8s:sdfedgtjhkrghkghdft
operator:
image: our-repo-cngccp-docker-k8s.jfrog.io/openunison-kubernetes-operator:dfghktyirtyritg
validators: []
mutators: []
network:
openunison_host: "login-{{metadata.annotations.name}}-{{metadata.annotations.environment}}-{{metadata.annotations.region}}-aws.cf.platform.our-repo.cloud"
dashboard_host: "dashboard-{{metadata.annotations.name}}-{{metadata.annotations.environment}}-{{metadata.annotations.region}}-aws.cf.platform.our-repo.cloud"
api_server_host: "ou-api-{{metadata.annotations.name}}-{{metadata.annotations.environment}}-{{metadata.annotations.region}}-aws.cf.platform.our-repo.cloud"
session_inactivity_timeout_seconds: 36000
k8s_url: ''
force_redirect_to_tls: false
createIngressCertificate: false
ingress_type: none
ingress_certificate: ou-tls-main-certificate
ingress_annotations:
certmanager.k8s.io/cluster-issuer: letsencrypt
kubernetes.io/ingress.class: nginx
cert_template:
ou: "Kubernetes"
o: "MyOrg"
l: "My Cluster"
st: "State of Cluster"
c: "MyCountry"
myvd_config_path: "WEB-INF/myvd.conf"
k8s_cluster_name: "{{metadata.annotations.name}}-{{metadata.annotations.environment}}-{{metadata.annotations.region}}-aws.cf.platform.our-repo.cloud"
enable_impersonation: true
cert_update_image: our-repo-cngccp-docker-k8s.jfrog.io/openunison-kubernetes-operator:dfghktyirtyritg
impersonation:
jetstack_oidc_proxy_image: our-repo-cngccp-docker-k8s.jfrog.io/kube-oidc-proxy:eretrtyi45864856834e
use_jetstack: true
explicit_certificate_trust: true
ca_secret_name: ou-tls-certificate
dashboard:
namespace: "kubernetes-dashboard"
cert_name: "kubernetes-dashboard-certs"
label: "k8s-app=kubernetes-dashboard"
service_name: kubernetes-dashboard
require_session: true
certs:
use_k8s_cm: false
trusted_certs: []
monitoring:
prometheus_service_account: system:serviceaccount:monitoring:prometheus-k8s
oidc:
client_id: "{{metadata.annotations.oidc_client_id}}"
issuer: https://e52416c3-mckid-us.okta.com
user_in_idtoken: false
domain: ""
scopes: openid email profile groups
claims:
sub: sub
email: email
given_name: given_name
family_name: family_name
display_name: name
groups: groups
network_policies:
enabled: false
ingress:
enabled: false
monitoring:
enabled: false
apiserver:
enabled: false
services:
pullSecret: "jfrog-auth"
enable_tokenrequest: false
token_request_audience: api
token_request_expiration_seconds: 14400
node_selectors: []
resources:
limits:
cpu: 500m
memory: 2050Mi
requests:
cpu: 200m
memory: 1024Mi
openunison:
replicas: 2
non_secret_data:
K8S_DB_SSO: oidc
PROMETHEUS_SERVICE_ACCOUNT: system:serviceaccount:monitoring:prometheus-k8s
secrets: []
html:
image: our-repo-cngccp-docker-k8s.jfrog.io/openunison-k8s-html:d38662bde4ea41efab695a41cb4fc9766a39ece99f9ea4fed2a9ffac7670c0a2
prefix: openunison
enable_provisioning: false
source:
repoURL: https://nexus.tremolo.io/repository/helm/
targetRevision: 1.0.24
chart: openunison-k8s-login-oidc
helm:
releaseName: orchestra
values: |
deployment_data:
pull_secret: jfrog-auth
enable_impersonation: true
image: "our-repo-cngccp-docker-k8s.jfrog.io/openunison-k8s-login-oidc:sderfdserfd"
impersonation:
ca_secret_name: ou-tls-certificate
explicit_certificate_trust: true
jetstack_oidc_proxy_image: our-repo-cngccp-docker-k8s.jfrog.io/kube-oidc-proxy:swedfrty
use_jetstack: true
k8s_cluster_name: "{{metadata.annotations.name}}-{{metadata.annotations.environment}}-{{metadata.annotations.region}}-aws.cf.platform.our-repo.cloud"
myvd_configmap: ''
network:
api_server_host: "ou-api-{{metadata.annotations.name}}-{{metadata.annotations.environment}}-{{metadata.annotations.region}}-aws.cf.platform.our-repo.cloud"
createIngressCertificate: false
dashboard_host: "dashboard-{{metadata.annotations.name}}-{{metadata.annotations.environment}}-{{metadata.annotations.region}}-aws.cf.platform.our-repo.cloud"
ingress_annotations:
certmanager.k8s.io/cluster-issuer: letsencrypt
kubernetes.io/ingress.class: openunison
k8s_url: ''
openunison_host: "login-{{metadata.annotations.name}}-{{metadata.annotations.environment}}-{{metadata.annotations.region}}-aws.cf.platform.our-repo.cloud"
session_inactivity_timeout_seconds: 36000
ingress_type: none
oidc:
auth_url: https://ourorg-orgid-us.okta.com/oauth2/v1/authorize
client_id: "{{metadata.annotations.oidc_client_id}}"
token_url: https://ourorg-orgid-us.okta.com/oauth2/v1/token
user_in_idtoken: false
userinfo_url: https://ourorg-orgid-us.okta.com/oauth2/v1/userinfo
openunison:
replicas: 2
services:
pullSecret: jfrog-auth
resources:
limits:
cpu: 500m
memory: 2050Mi
requests:
cpu: 200m
memory: 1024Mi
token_request_expiration_seconds: 14400
trusted_certs: []
Installation method - Helm
@mlbiam Could you please help us on this issue, we are getting such issues frequently on our clusters which is causing unnecessary user noice..
I need the version of openunison. If you're not using one of our versioned images, please get it from the first line of the logs. It will look like OpenUnisonOnUndertow - Starting OpenUnison on Undertow 1.0.38-2023072501
The stack trace looks like its associated with an old openunison image. Can you also give the original source images labels? The refrences to your internal jfrog doesn't really give me any information.
Thanks
@mlbiam It is as below :-
OpenUnisonOnUndertow - Starting OpenUnison on Undertow 1.0.24-2021110502
That's almost two years old. I doubt it would work with the 2.3.34 orchestra-login-portal-argocd helm chart. Based on the error message I think what's happening is that the token in the container is expiring. It works on restart because you're getting a new token.
Since you're already using a modern chart, I'd suggest using the 1.0.37 version of the container - ghcr.io/openunison/openunison-k8s:1.0.37
Hi @mlbiam there is one correction We are still using old helm and working on testing the upgraded helm charts. Can you please check below and check if we can fix this issue on existing version which we are using
We are using mutiple Application objects for openunison and orchestra. Below are the ArgoCD objects with helm values.
Openunison
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
annotations:
argocd.argoproj.io/sync-wave: '-2'
name: openunison
namespace: argocd
spec:
destination:
namespace: openunison
server: 'https://xxxxxxxx'
project: cnt
source:
chart: openunison-operator
helm:
releaseName: openunison
values: |-
{
"image": "xxxxxxxx/openunison-k8s-operator:xxxxxxxx",
"services": {
"pullSecret": "jfrog-auth"
}
}
repoURL: 'https://nexus.tremolo.io/repository/helm/'
targetRevision: 2.0.6
syncPolicy:
automated:
prune: true
syncOptions:
- ApplyOutOfSyncOnly=true
Orchestra
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: orchestra
namespace: argocd
spec:
destination:
namespace: openunison
server: 'https://xxxxxxxx'
project: cnt
source:
chart: openunison-k8s-login-oidc
helm:
releaseName: orchestra
values: |-
{
"cert_template": {
"c": "xxxxxxxx",
"l": "xxxxxxxx",
"o": "dev",
"ou": "xxxxxxxx",
"st": "xxxxxxxx"
},
"deployment_data": {
"pull_secret": "jfrog-auth"
},
"enable_impersonation": true,
"image": "xxxxxxxx/openunison-k8s-login-oidc:xxxxxxxx",
"impersonation": {
"ca_secret_name": "xxxxxxxx",
"explicit_certificate_trust": true,
"jetstack_oidc_proxy_image": "xxxxxxxx/kube-oidc-proxy:xxxxxxxx",
"oidc_tls_secret_name": "tls-certificate",
"use_jetstack": true
},
"k8s_cluster_name": "xxxxxxxx",
"myvd_configmap": "",
"network": {
"api_server_host": "dev-ou-api.com",
"createIngressCertificate": false,
"dashboard_host": "dev-dashboard.com",
"ingress_annotations": {
"certmanager.k8s.io/cluster-issuer": "letsencrypt",
"kubernetes.io/ingress.class": "openunison"
},
"ingress_certificate": "",
"ingress_type": "none",
"k8s_url": "",
"openunison_host": "dev-login.com",
"session_inactivity_timeout_seconds": xxxxxxxx
},
"oidc": {
"auth_url": "https://xxxxxxxx",
"client_id": "xxxxxxxx",
"token_url": "https://xxxxxxxx",
"user_in_idtoken": xxxxxxxx,
"userinfo_url": "https://xxxxxxxx"
},
"openunison": {
"replicas": 2
},
"services": {
"pullSecret": "jfrog-auth",
"resources": {
"limits": {
"cpu": "500m",
"memory": "2048Mi"
},
"requests": {
"cpu": "200m",
"memory": "1024Mi"
}
},
"token_request_expiration_seconds": xxxxxxxx
},
"trusted_certs": [
{
"name": "xxxxxxxx",
"pem_b64": "xxxxxxxx"
}
]
}
repoURL: 'https://nexus.tremolo.io/repository/helm/'
targetRevision: 1.0.24
syncPolicy:
automated:
prune: true
syncOptions:
- ApplyOutOfSyncOnly=true
Could you pls take a look at this and share if we can fix this in existing version without upgrading whole helm chart?
Version of Openunison
OpenUnisonOnUndertow - Starting OpenUnison on Undertow 1.0.24-2021110502
Could you pls take a look at this and share if we can fix this in existing version without upgrading whole helm chart?
There's no fix for a version that's two years old or charts that are end-of-life. The version of OpenUnison you're using contains a static configuration that is embedded into the container, so there's nothing that can be fixed via helm. The new version is configured via CRDs which provides much more flexibility. Back-porting a fix is a complex process that requires a tremendous amount of QA to reproduce and validate. It's only something we'll do for customers with commercial support contracts.
closing due to inactivity