Readiness probe failed: panic: open /var/log/mongodb-mms-automation/healthstatus/agent-health-status.json: no such file or directory
prachiwaghulkar opened this issue · 21 comments
What did you do to encounter the bug?
Applied the mongodb CR using MongoDB image 5.0.26.
mongodb pod is in CrashLoopBackOff and mongodbcommunity is in Pending state.
mongodb-kubernetes-operator-54c9d54fbc-mch6k 1/1 Running 0 8m49s
staging-mongodb-0 0/2 CrashLoopBackOff 1 (3s ago) 26s
prachiwaghulkar@Prachis-MacBook-Pro ~ % oc get mongodbcommunity
NAME PHASE VERSION
staging-mongodb Pending
Pod logs give the following error:
oc logs -p staging-mongodb-0
Defaulted container "mongod" out of: mongod, mongodb-agent, mongod-posthook (init), mongodb-agent-readinessprobe (init)
exec /bin/sh: exec format error
Describe on the pod gives below error in events:
Warning BackOff 21s (x2 over 22s) kubelet Back-off restarting failed container
Warning Unhealthy 15s kubelet Readiness probe failed: panic: open /var/log/mongodb-mms-automation/healthstatus/agent-health-status.json: no such file or directory
goroutine 1 [running]:
main.main()
/workspace/cmd/readiness/main.go:226 +0x191
What did you expect?
/var/log/mongodb-mms-automation/healthstatus/agent-health-status.json should exist and this error should not come.
What happened instead?
/var/log/mongodb-mms-automation/healthstatus/agent-health-status.json file doesn't exist and the error is thrown. mongodb pod is in crashloopbackoff.
Operator Information
- Operator Version: 0.9.0
- MongoDB Image used: 5.0.26
If possible, please include:
- The operator logs
Running ./manager
2024-04-18T15:08:25.013Z INFO manager/main.go:74 Watching namespace: staging
I0418 15:08:26.063962 10 request.go:690] Waited for 1.037245763s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/apis/imageregistry.operator.openshift.io/v1?timeout=32s
2024-04-18T15:08:28.669Z INFO manager/main.go:91 Registering Components.
2024-04-18T15:08:28.669Z INFO manager/main.go:104 Starting the Cmd.
2024-04-18T15:16:41.150Z INFO controllers/replica_set_controller.go:130 Reconciling MongoDB {"ReplicaSet": "staging/staging-mongodb"}
2024-04-18T15:16:41.151Z DEBUG controllers/replica_set_controller.go:132 Validating MongoDB.Spec {"ReplicaSet": "staging/staging-mongodb"}
2024-04-18T15:16:41.151Z DEBUG controllers/replica_set_controller.go:142 Ensuring the service exists {"ReplicaSet": "staging/staging-mongodb"}
2024-04-18T15:16:41.352Z DEBUG agent/replica_set_port_manager.go:122 No port change required {"ReplicaSet": "staging/staging-mongodb"}
2024-04-18T15:16:41.414Z INFO controllers/replica_set_controller.go:468 Create/Update operation succeeded {"ReplicaSet": "staging/staging-mongodb", "operation": "created"}
2024-04-18T15:16:41.414Z INFO controllers/mongodb_tls.go:43 Ensuring TLS is correctly configured {"ReplicaSet": "staging/staging-mongodb"}
2024-04-18T15:16:41.414Z INFO controllers/mongodb_tls.go:86 Successfully validated TLS config {"ReplicaSet": "staging/staging-mongodb"}
2024-04-18T15:16:41.414Z INFO controllers/replica_set_controller.go:293 TLS is enabled, creating/updating CA secret {"ReplicaSet": "staging/staging-mongodb"}
2024-04-18T15:16:41.430Z INFO controllers/replica_set_controller.go:297 TLS is enabled, creating/updating TLS secret {"ReplicaSet": "staging/staging-mongodb"}
2024-04-18T15:16:41.438Z DEBUG controllers/replica_set_controller.go:400 Enabling TLS on a deployment with a StatefulSet that is not Ready, the Automation Config must be updated first {"ReplicaSet": "staging/staging-mongodb"}
2024-04-18T15:16:41.438Z INFO controllers/replica_set_controller.go:360 Creating/Updating AutomationConfig {"ReplicaSet": "staging/staging-mongodb"}
2024-04-18T15:16:41.438Z DEBUG scram/scram.go:128 No existing credentials found, generating new credentials
2024-04-18T15:16:41.438Z DEBUG scram/scram.go:106 Generating new credentials and storing in secret/root-scram2-scram-credentials
2024-04-18T15:16:41.561Z DEBUG scram/scram.go:117 Successfully generated SCRAM credentials
2024-04-18T15:16:41.561Z DEBUG scram/scram.go:128 No existing credentials found, generating new credentials
2024-04-18T15:16:41.561Z DEBUG scram/scram.go:106 Generating new credentials and storing in secret/metadata-scram2-scram-credentials
2024-04-18T15:16:41.637Z DEBUG scram/scram.go:117 Successfully generated SCRAM credentials
2024-04-18T15:16:41.854Z DEBUG agent/replica_set_port_manager.go:122 No port change required {"ReplicaSet": "staging/staging-mongodb"}
2024-04-18T15:16:41.854Z DEBUG agent/replica_set_port_manager.go:40 Calculated process port map: map[staging-mongodb-0:27017] {"ReplicaSet": "staging/staging-mongodb"}
2024-04-18T15:16:41.854Z DEBUG controllers/replica_set_controller.go:535 AutomationConfigMembersThisReconciliation {"mdb.AutomationConfigMembersThisReconciliation()": 1}
2024-04-18T15:16:41.908Z DEBUG controllers/replica_set_controller.go:379 The existing StatefulSet did not have the readiness probe init container, skipping pod annotation check. {"ReplicaSet": "staging/staging-mongodb"}
2024-04-18T15:16:41.908Z INFO controllers/replica_set_controller.go:335 Creating/Updating StatefulSet {"ReplicaSet": "staging/staging-mongodb"}
2024-04-18T15:16:41.918Z INFO controllers/replica_set_controller.go:340 Creating/Updating StatefulSet for Arbiters {"ReplicaSet": "staging/staging-mongodb"}
2024-04-18T15:16:41.961Z DEBUG controllers/replica_set_controller.go:350 Ensuring StatefulSet is ready, with type: RollingUpdate {"ReplicaSet": "staging/staging-mongodb"}
2024-04-18T15:16:41.961Z INFO controllers/mongodb_status_options.go:110 ReplicaSet is not yet ready, retrying in 10 seconds
2024-04-18T15:16:41.981Z INFO controllers/replica_set_controller.go:130 Reconciling MongoDB {"ReplicaSet": "staging/staging-mongodb"}
2024-04-18T15:16:41.981Z DEBUG controllers/replica_set_controller.go:132 Validating MongoDB.Spec {"ReplicaSet": "staging/staging-mongodb"}
2024-04-18T15:16:41.981Z DEBUG controllers/replica_set_controller.go:142 Ensuring the service exists {"ReplicaSet": "staging/staging-mongodb"}
2024-04-18T15:16:41.982Z DEBUG agent/replica_set_port_manager.go:122 No port change required {"ReplicaSet": "staging/staging-mongodb"}
2024-04-18T15:16:41.986Z INFO controllers/replica_set_controller.go:468 Create/Update operation succeeded {"ReplicaSet": "staging/staging-mongodb", "operation": "updated"}
2024-04-18T15:16:41.986Z INFO controllers/mongodb_tls.go:43 Ensuring TLS is correctly configured {"ReplicaSet": "staging/staging-mongodb"}
2024-04-18T15:16:41.986Z INFO controllers/mongodb_tls.go:86 Successfully validated TLS config {"ReplicaSet": "staging/staging-mongodb"}
2024-04-18T15:16:41.986Z INFO controllers/replica_set_controller.go:293 TLS is enabled, creating/updating CA secret {"ReplicaSet": "staging/staging-mongodb"}
2024-04-18T15:16:41.992Z INFO controllers/replica_set_controller.go:297 TLS is enabled, creating/updating TLS secret {"ReplicaSet": "staging/staging-mongodb"}
2024-04-18T15:16:42.097Z DEBUG controllers/replica_set_controller.go:400 Enabling TLS on a deployment with a StatefulSet that is not Ready, the Automation Config must be updated first {"ReplicaSet": "staging/staging-mongodb"}
2024-04-18T15:16:42.097Z INFO controllers/replica_set_controller.go:360 Creating/Updating AutomationConfig {"ReplicaSet": "staging/staging-mongodb"}
2024-04-18T15:16:42.106Z DEBUG scram/scram.go:101 Credentials have not changed, using credentials stored in: secret/root-scram2-scram-credentials
2024-04-18T15:16:42.114Z DEBUG scram/scram.go:101 Credentials have not changed, using credentials stored in: secret/metadata-scram2-scram-credentials
2024-04-18T15:16:42.114Z DEBUG agent/replica_set_port_manager.go:122 No port change required {"ReplicaSet": "staging/staging-mongodb"}
2024-04-18T15:16:42.114Z DEBUG agent/replica_set_port_manager.go:40 Calculated process port map: map[staging-mongodb-0:27017] {"ReplicaSet": "staging/staging-mongodb"}
2024-04-18T15:16:42.114Z DEBUG controllers/replica_set_controller.go:535 AutomationConfigMembersThisReconciliation {"mdb.AutomationConfigMembersThisReconciliation()": 1}
2024-04-18T15:16:42.115Z DEBUG controllers/replica_set_controller.go:383 Waiting for agents to reach version 1 {"ReplicaSet": "staging/staging-mongodb"}
2024-04-18T15:16:42.115Z INFO controllers/replica_set_controller.go:335 Creating/Updating StatefulSet {"ReplicaSet": "staging/staging-mongodb"}
2024-04-18T15:16:42.197Z INFO controllers/replica_set_controller.go:340 Creating/Updating StatefulSet for Arbiters {"ReplicaSet": "staging/staging-mongodb"}
2024-04-18T15:16:42.289Z DEBUG controllers/replica_set_controller.go:350 Ensuring StatefulSet is ready, with type: RollingUpdate {"ReplicaSet": "staging/staging-mongodb"}
2024-04-18T15:16:42.289Z INFO controllers/mongodb_status_options.go:110 ReplicaSet is not yet ready, retrying in 10 seconds
2024-04-18T15:16:42.303Z INFO controllers/replica_set_controller.go:130 Reconciling MongoDB {"ReplicaSet": "staging/staging-mongodb"}
2024-04-18T15:16:42.303Z DEBUG controllers/replica_set_controller.go:132 Validating MongoDB.Spec {"ReplicaSet": "staging/staging-mongodb"}
- You need to configure the field
users
in the mongodbcommunity custom resource. - You need to modify the
readinessProbe.initialDelaySeconds
to10
of container mongd.
@laiminhtrung1997 Unfortunately, the readinessProbe still fails and pod goes in CrashLoopBackOff. Have provided readinessProbe.initialDelaySeconds
as 10 to mongod container. users
field was already configured in the mongodbcommunity CR.
Normal Created 6m40s (x3 over 6m54s) kubelet Created container mongod
Warning Unhealthy 6m40s (x2 over 6m40s) kubelet Readiness probe failed:
Normal Started 6m39s (x3 over 6m54s) kubelet Started container mongod
Warning BackOff 111s (x25 over 6m51s) kubelet Back-off restarting failed container
Dear @prachiwaghulkar
Could you please provide the manifest of your mdbc cr?
@laiminhtrung1997 PFB the mdbc cr manifest.
apiVersion: v1
items:
- apiVersion: mongodbcommunity.mongodb.com/v1
kind: MongoDBCommunity
metadata:
name: staging-mongodb
namespace: staging
spec:
additionalMongodConfig:
net.maxIncomingConnections: 900
featureCompatibilityVersion: "5.0"
members: 1
security:
authentication:
ignoreUnknownUsers: true
modes:
- SCRAM
tls:
caConfigMapRef:
name: staging-mongodb-cert-ca-cm
certificateKeySecretRef:
name: staging-mongodb-cert
enabled: true
statefulSet:
spec:
template:
spec:
containers:
- image: docker-na-public.artifactory.swg-devops.com/sec-guardium-next-gen-docker-local/mongo:5.0.26
name: mongod
readinessProbe:
initialDelaySeconds: 10
resources:
limits:
cpu: "4"
ephemeral-storage: 5Gi
memory: 10Gi
requests:
cpu: "1"
ephemeral-storage: 1Gi
memory: 2Gi
imagePullSecrets:
- name: ibm-entitlement-key
initContainers:
- name: mongodb-agent-readinessprobe
resources:
limits:
cpu: 100m
memory: 500Mi
requests:
cpu: 6m
memory: 6Mi
- name: mongod-posthook
resources:
limits:
cpu: 100m
memory: 500Mi
requests:
cpu: 6m
memory: 6Mi
volumeClaimTemplates:
- apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: data-volume
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 100Gi
storageClassName: rook-ceph-block
volumeMode: Filesystem
- apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: logs-volume
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 100Gi
storageClassName: rook-ceph-block
volumeMode: Filesystem
type: ReplicaSet
users:
- db: admin
name: root
passwordSecretRef:
key: mongodbRootPassword
name: ibm-mongodb-authsecret
roles:
- db: admin
name: clusterAdmin
- db: admin
name: userAdminAnyDatabase
- db: admin
name: readWriteAnyDatabase
scramCredentialsSecretName: root-scram2
- db: tnt_mbr_meta
name: metadata
passwordSecretRef:
key: mongodbMetadataPassword
name: ibm-mongodb-authsecret
roles:
- db: tnt_mbr_meta
name: dbOwner
scramCredentialsSecretName: metadata-scram2
version: 5.0.26
The log of container mongodb-agent in mongodb-0 too, please.
The log of mongodb-agent:
prachiwaghulkar@Prachis-MacBook-Pro cert-request % oc logs pod/staging-mongodb-0 -c mongodb-agent
cat: /mongodb-automation/agent-api-key/agentApiKey: No such file or directory
[2024-04-19T05:31:54.604+0000] [.debug] [util/distros/distros.go:LinuxFlavorAndVersionUncached:144] Detected linux flavor ubuntu version 20.4
Hmmmm. My mdbc does not configure the TLS, and the MongoDB started without any errors. I have no idea. Sorry for cannot help you.
@irajdeep Can anybody from the community take a look and be able to assist here? It is important for us to move to 5.0.26
@prachiwaghulkar can you please provide the agent logs and health logs as described here?
https://github.com/mongodb/mongodb-kubernetes-operator/blob/master/.github/ISSUE_TEMPLATE/bug_report.md
Having said that exec /bin/sh: exec format error
seems like an architecture error. Are you running arm on amd or amd on arm? I suggest to change it to either and test it again.
@nammn I have used the following image: sha256:0172fb2a286d3dc9823f0e377587c0a545022bd330c817ed6b8bc231ea0643ad which is linux/amd64. We are updating from 5.0.24 to 5.0.26. 5.0.24 with amd worked fine for us.
PFB the logs:
Agent logs:
(venv) prachiwaghulkar@Prachis-MacBook-Pro ~ % kubectl exec -it staging-mongodb-0 -c mongodb-agent -- cat /var/log/mongodb-mms-automation/automation-agent.log
[2024-04-22T06:23:30.847+0000] [header.info] [::0] GitCommitId = 956e3386ad456471db1776d79637a38f182a6088
[2024-04-22T06:23:30.847+0000] [header.info] [::0] AutomationVersion = 107.0.0.8465
[2024-04-22T06:23:30.847+0000] [header.info] [::0] localhost = staging-mongodb-0.staging-mongodb-svc.staging.svc.cluster.local
[2024-04-22T06:23:30.847+0000] [header.info] [::0] ErrorStateSleepTime = 10s
[2024-04-22T06:23:30.847+0000] [header.info] [::0] GoalStateSleepTime = 10s
[2024-04-22T06:23:30.847+0000] [header.info] [::0] NotGoalStateSleepTime = 1s
[2024-04-22T06:23:30.847+0000] [header.info] [::0] PlanCutoffTime = 300000
[2024-04-22T06:23:30.847+0000] [header.info] [::0] TracePlanner = false
[2024-04-22T06:23:30.847+0000] [header.info] [::0] User = 2000
[2024-04-22T06:23:30.847+0000] [header.info] [::0] Go version = go1.20.10
[2024-04-22T06:23:30.847+0000] [header.info] [::0] MmsBaseURL =
[2024-04-22T06:23:30.847+0000] [header.info] [::0] MmsGroupId =
[2024-04-22T06:23:30.847+0000] [header.info] [::0] HttpProxy =
[2024-04-22T06:23:30.847+0000] [header.info] [::0] DisableHttpKeepAlive = false
[2024-04-22T06:23:30.847+0000] [header.info] [::0] HttpsCAFile =
[2024-04-22T06:23:30.847+0000] [header.info] [::0] TlsRequireValidMMSServerCertificates = true
[2024-04-22T06:23:30.847+0000] [header.info] [::0] TlsMMSServerClientCertificate =
[2024-04-22T06:23:30.847+0000] [header.info] [::0] KMIPProxyCertificateDir = /tmp
[2024-04-22T06:23:30.847+0000] [header.info] [::0] EnableLocalConfigurationServer = false
[2024-04-22T06:23:30.847+0000] [header.info] [::0] DialTimeoutSeconds = 40
[2024-04-22T06:23:30.847+0000] [header.info] [::0] KeepUnusedMongodbVersions = false
[2024-04-22T06:23:30.847+0000] [header.info] [::0] DisallowDowngrades = false
[2024-04-22T06:23:30.846+0000] [.error] [src/action/start.go:func1:145] [103] <staging-mongodb-0> [06:23:30.846] Error sleeping until process was up : <staging-mongodb-0> [06:23:30.846] Process staging-mongodb-0.staging-mongodb-svc.staging.svc.cluster.local:27017 (local=false) has not come up despite waiting for 30m0s
[2024-04-22T06:23:30.846+0000] [.error] [src/director/director.go:executePlan:988] <staging-mongodb-0> [06:23:30.846] Failed to apply action. Result = <nil> : <staging-mongodb-0> [06:23:30.846] Error sleeping until process was up : <staging-mongodb-0> [06:23:30.846] Process staging-mongodb-0.staging-mongodb-svc.staging.svc.cluster.local:27017 (local=false) has not come up despite waiting for 30m0s
[2024-04-22T06:23:30.846+0000] [.error] [src/director/director.go:planAndExecute:585] <staging-mongodb-0> [06:23:30.846] Plan execution failed on step StartFresh as part of move Start : <staging-mongodb-0> [06:23:30.846] Failed to apply action. Result = <nil> : <staging-mongodb-0> [06:23:30.846] Error sleeping until process was up : <staging-mongodb-0> [06:23:30.846] Process staging-mongodb-0.staging-mongodb-svc.staging.svc.cluster.local:27017 (local=false) has not come up despite waiting for 30m0s
[2024-04-22T06:23:30.846+0000] [.error] [src/director/director.go:mainLoop:394] <staging-mongodb-0> [06:23:30.846] Failed to planAndExecute : <staging-mongodb-0> [06:23:30.846] Plan execution failed on step StartFresh as part of move Start : <staging-mongodb-0> [06:23:30.846] Failed to apply action. Result = <nil> : <staging-mongodb-0> [06:23:30.846] Error sleeping until process was up : <staging-mongodb-0> [06:23:30.846] Process staging-mongodb-0.staging-mongodb-svc.staging.svc.cluster.local:27017 (local=false) has not come up despite waiting for 30m0s
[2024-04-22T06:50:56.215+0000] [header.info] [::0] GitCommitId = 956e3386ad456471db1776d79637a38f182a6088
[2024-04-22T06:50:56.215+0000] [header.info] [::0] AutomationVersion = 107.0.0.8465
[2024-04-22T06:50:56.215+0000] [header.info] [::0] localhost = staging-mongodb-0.staging-mongodb-svc.staging.svc.cluster.local
[2024-04-22T06:50:56.215+0000] [header.info] [::0] ErrorStateSleepTime = 10s
[2024-04-22T06:50:56.215+0000] [header.info] [::0] GoalStateSleepTime = 10s
[2024-04-22T06:50:56.215+0000] [header.info] [::0] NotGoalStateSleepTime = 1s
[2024-04-22T06:50:56.215+0000] [header.info] [::0] PlanCutoffTime = 300000
[2024-04-22T06:50:56.215+0000] [header.info] [::0] TracePlanner = false
[2024-04-22T06:50:56.215+0000] [header.info] [::0] User = 2000
[2024-04-22T06:50:56.215+0000] [header.info] [::0] Go version = go1.20.10
[2024-04-22T06:50:56.215+0000] [header.info] [::0] MmsBaseURL =
[2024-04-22T06:50:56.215+0000] [header.info] [::0] MmsGroupId =
[2024-04-22T06:50:56.215+0000] [header.info] [::0] HttpProxy =
[2024-04-22T06:50:56.215+0000] [header.info] [::0] DisableHttpKeepAlive = false
[2024-04-22T06:50:56.215+0000] [header.info] [::0] HttpsCAFile =
[2024-04-22T06:50:56.215+0000] [header.info] [::0] TlsRequireValidMMSServerCertificates = true
[2024-04-22T06:50:56.215+0000] [header.info] [::0] TlsMMSServerClientCertificate =
[2024-04-22T06:50:56.215+0000] [header.info] [::0] KMIPProxyCertificateDir = /tmp
[2024-04-22T06:50:56.215+0000] [header.info] [::0] EnableLocalConfigurationServer = false
[2024-04-22T06:50:56.215+0000] [header.info] [::0] DialTimeoutSeconds = 40
[2024-04-22T06:50:56.215+0000] [header.info] [::0] KeepUnusedMongodbVersions = false
[2024-04-22T06:50:56.215+0000] [header.info] [::0] DisallowDowngrades = false
[2024-04-22T06:50:56.253+0000] [.error] [main/components/agent.go:ApplyClusterConfig:358] [06:50:56.253] Log path absent for process=state.ProcessConfigName=staging-mongodb-0ProcessType=mongodVersion=5.0.26FullVersion={"trueName":"5.0.26","gitVersion":"","modules":[],"major":5,"minor":0,"patch":26}Disabled=falseManualMode=falseNumCores=0CpuAffinity=[]LogRotate={"sizeThresholdMB":0,"timeThresholdHrs":0,"numUncompressed":0,"numTotal":0,"percentOfDiskspace":0,"includeAuditLogsWithMongoDBLogs":false}AuditLogRotate=<nil>LastResync="0001-01-01T00:00:00Z"LastThirdPartyRestoreResync="0001-01-01T00:00:00Z"LastCompact="0001-01-01T00:00:00Z"LastKmipMasterKeyRotation="0001-01-01T00:00:00Z"LastRestart="0001-01-01T00:00:00Z"Hostname=staging-mongodb-0.staging-mongodb-svc.staging.svc.cluster.localAlias=Cluster=AuthSchemaVersion=5FeatureCompatibilityVersion=5.0Kerberos=<nil>Args={"net":{"bindIp":"0.0.0.0","maxIncomingConnections":900,"port":27017,"tls":{"CAFile":"/var/lib/tls/ca/4971db0032afff31ab1235e283ef9ab7c9a4a483d630427923d253a41152cf13.pem","allowConnectionsWithoutCertificates":true,"certificateKeyFile":"/var/lib/tls/server/f131542c6e26217a9f960431d5177cd904c1a5661fd08482f4a194e836baa228.pem","mode":"requireTLS"}},"replication":{"replSetName":"staging-mongodb"},"setParameter":{"authenticationMechanisms":"SCRAM-SHA-256"},"storage":{"dbPath":"/data"}}ProcessAuthInfo={"UsersWanted":[{"user":"root","db":"admin","authenticationRestrictions":[],"scramSha1Creds":{"iterationCount":10000,"salt":"ztQzG8EXxgOT8qSloG6LfA==","storedKey":"+p4exhjiiYZoOIHahzi414ZINBs=","serverKey":"iV4qydBHQksSjyzTXDidhvn/9iY="},"scramSha256Creds":{"iterationCount":15000,"salt":"CcGExKTDjHefywe7CtG1VdfnqA9clT12VRz6MA==","storedKey":"JLVuWDSmdtJNNXRNVKf6Jw7MsofcbJP9G0N03N66Yb0=","serverKey":"d8R15D/XS9YVXwwDb6NjHBMCoYIrIxeUYU7PAK8tw7k="},"roles":[{"role":"clusterAdmin","db":"admin","minFcv":""},{"role":"readWriteAnyDatabase","db":"admin","minFcv":""},{"role":"userAdminAnyDatabase","db":"admin","minFcv":""}],"inheritedRoles":null,"mechanisms":[],"scope":null},{"user":"metadata","db":"tnt_mbr_meta","authenticationRestrictions":[],"scramSha1Creds":{"iterationCount":10000,"salt":"iTSe6nHUP2rYsv8XvRgnnA==","storedKey":"sh4Q4pq/+EnduxDhyLEaY6bix3Y=","serverKey":"sL7I88TKpiWOJcD1X2MJHxBIIAg="},"scramSha256Creds":{"iterationCount":15000,"salt":"1zfBNBYr0OXWlPpMdZsoark+HcMfxoX0MltBpQ==","storedKey":"uBZBpVzBawhgY1wp8p52UlTzAtpkOc3UEgKC7JGPwbU=","serverKey":"EacvAm/pNKMyUobWrb0aL8+Og3BJ/W174YVhLMn8SWU="},"roles":[{"role":"dbOwner","db":"tnt_mbr_meta","minFcv":""}],"inheritedRoles":null,"mechanisms":[],"scope":null}],"UsersDeleted":null,"Roles":null,"DesiredKey":"[ZpnXOsjiRNY-REDACTED-G04AKp5vLX0]","DesiredNewKey":"[ZpnXOsjiRNY-REDACTED-G04AKp5vLX0]","DesiredKeyHash":"KU8dVQoozhHdkGTMBh4UjQbqYTRFiyc9/juP3AbNnho=","DesiredNewKeyHash":null,"KeyfileHashes":["KU8dVQoozhHdkGTMBh4UjQbqYTRFiyc9/juP3AbNnho="],"UsingAuth":true}IsConfigServer=falseIsShardServer=falseIsInReplSet=trueIsStandalone=falseIsArbiter=falseDownloadBase=FullySyncRsTags=falseReplicaSetId=staging-mongodbBackupRestoreUrl=<redacted>, BackupRestoreUrlV3=BackupParallelRestoreUrl=BackupParallelRestoreNumChunks=0BackupParallelRestoreNumWorkers=0BackupThirdPartyRestoreBaseUrl=BackupRestoreRsVersion=0BackupRestoreElectionTerm=0BackupRestoreCheckpointTimestamp=<nil>BackupRestoreCertificateValidationHostname=BackupRestoreSystemUsersUUID=BackupRestoreSystemRolesUUID=BackupRestoreBalancerSettings=nullBackupRestoreConfigSettingsUUID=BackupShardIdRestoreMaps=[]DirectAttachVerificationKey=DirectAttachSourceClusterName=DirectAttachShouldFilterByFileList=falseConfigPath=StorageEngine=BackupRestoreOplogBaseUrl=BackupRestoreOplog=<nil>BackupRestoreDesiredTime=<nil>BackupRestoreSourceRsId=BackupRestoreFilterList=<nil>BackupRestoreFilteredFileListUrl=BackupRestoreJobId=BackupRestoreVerificationKey=BackupRestoreSourceGroupId=PitRestoreType=BackupThirdPartyOplogStoreType=EncryptionProviderType=KMIPProxyPort=0KMIPProxyDisabled=falseTemporaryPort=0CredentialsVersion=0Repair=nullRealtimeConfig=<nil>DataExplorerConfig=<nil>DefaultRWConcern=<nil>LdapCaPath=ConfigServers=[]RestartIntervalTimeMs=<nil>ClusterWideConfiguration=ProfilingConfig=<nil>RegionBaseUrl=RegionBaseRealtimeUrl=RegionBaseAgentUrl=StepDownPrimaryForResync=falsekey=<nil>keyLock=null. log destination=
[2024-04-22T06:50:56.254+0000] [.error] [src/main/cm.go:mainLoop:520] [06:50:56.254] Error applying desired cluster configs : [06:50:56.253] Log path absent for process=state.ProcessConfigName=staging-mongodb-0ProcessType=mongodVersion=5.0.26FullVersion={"trueName":"5.0.26","gitVersion":"","modules":[],"major":5,"minor":0,"patch":26}Disabled=falseManualMode=falseNumCores=0CpuAffinity=[]LogRotate={"sizeThresholdMB":0,"timeThresholdHrs":0,"numUncompressed":0,"numTotal":0,"percentOfDiskspace":0,"includeAuditLogsWithMongoDBLogs":false}AuditLogRotate=<nil>LastResync="0001-01-01T00:00:00Z"LastThirdPartyRestoreResync="0001-01-01T00:00:00Z"LastCompact="0001-01-01T00:00:00Z"LastKmipMasterKeyRotation="0001-01-01T00:00:00Z"LastRestart="0001-01-01T00:00:00Z"Hostname=staging-mongodb-0.staging-mongodb-svc.staging.svc.cluster.localAlias=Cluster=AuthSchemaVersion=5FeatureCompatibilityVersion=5.0Kerberos=<nil>Args={"net":{"bindIp":"0.0.0.0","maxIncomingConnections":900,"port":27017,"tls":{"CAFile":"/var/lib/tls/ca/4971db0032afff31ab1235e283ef9ab7c9a4a483d630427923d253a41152cf13.pem","allowConnectionsWithoutCertificates":true,"certificateKeyFile":"/var/lib/tls/server/f131542c6e26217a9f960431d5177cd904c1a5661fd08482f4a194e836baa228.pem","mode":"requireTLS"}},"replication":{"replSetName":"staging-mongodb"},"setParameter":{"authenticationMechanisms":"SCRAM-SHA-256"},"storage":{"dbPath":"/data"}}ProcessAuthInfo={"UsersWanted":[{"user":"root","db":"admin","authenticationRestrictions":[],"scramSha1Creds":{"iterationCount":10000,"salt":"ztQzG8EXxgOT8qSloG6LfA==","storedKey":"+p4exhjiiYZoOIHahzi414ZINBs=","serverKey":"iV4qydBHQksSjyzTXDidhvn/9iY="},"scramSha256Creds":{"iterationCount":15000,"salt":"CcGExKTDjHefywe7CtG1VdfnqA9clT12VRz6MA==","storedKey":"JLVuWDSmdtJNNXRNVKf6Jw7MsofcbJP9G0N03N66Yb0=","serverKey":"d8R15D/XS9YVXwwDb6NjHBMCoYIrIxeUYU7PAK8tw7k="},"roles":[{"role":"clusterAdmin","db":"admin","minFcv":""},{"role":"readWriteAnyDatabase","db":"admin","minFcv":""},{"role":"userAdminAnyDatabase","db":"admin","minFcv":""}],"inheritedRoles":null,"mechanisms":[],"scope":null},{"user":"metadata","db":"tnt_mbr_meta","authenticationRestrictions":[],"scramSha1Creds":{"iterationCount":10000,"salt":"iTSe6nHUP2rYsv8XvRgnnA==","storedKey":"sh4Q4pq/+EnduxDhyLEaY6bix3Y=","serverKey":"sL7I88TKpiWOJcD1X2MJHxBIIAg="},"scramSha256Creds":{"iterationCount":15000,"salt":"1zfBNBYr0OXWlPpMdZsoark+HcMfxoX0MltBpQ==","storedKey":"uBZBpVzBawhgY1wp8p52UlTzAtpkOc3UEgKC7JGPwbU=","serverKey":"EacvAm/pNKMyUobWrb0aL8+Og3BJ/W174YVhLMn8SWU="},"roles":[{"role":"dbOwner","db":"tnt_mbr_meta","minFcv":""}],"inheritedRoles":null,"mechanisms":[],"scope":null}],"UsersDeleted":null,"Roles":null,"DesiredKey":"[ZpnXOsjiRNY-REDACTED-G04AKp5vLX0]","DesiredNewKey":"[ZpnXOsjiRNY-REDACTED-G04AKp5vLX0]","DesiredKeyHash":"KU8dVQoozhHdkGTMBh4UjQbqYTRFiyc9/juP3AbNnho=","DesiredNewKeyHash":null,"KeyfileHashes":["KU8dVQoozhHdkGTMBh4UjQbqYTRFiyc9/juP3AbNnho="],"UsingAuth":true}IsConfigServer=falseIsShardServer=falseIsInReplSet=trueIsStandalone=falseIsArbiter=falseDownloadBase=FullySyncRsTags=falseReplicaSetId=staging-mongodbBackupRestoreUrl=<redacted>, BackupRestoreUrlV3=BackupParallelRestoreUrl=BackupParallelRestoreNumChunks=0BackupParallelRestoreNumWorkers=0BackupThirdPartyRestoreBaseUrl=BackupRestoreRsVersion=0BackupRestoreElectionTerm=0BackupRestoreCheckpointTimestamp=<nil>BackupRestoreCertificateValidationHostname=BackupRestoreSystemUsersUUID=BackupRestoreSystemRolesUUID=BackupRestoreBalancerSettings=nullBackupRestoreConfigSettingsUUID=BackupShardIdRestoreMaps=[]DirectAttachVerificationKey=DirectAttachSourceClusterName=DirectAttachShouldFilterByFileList=falseConfigPath=StorageEngine=BackupRestoreOplogBaseUrl=BackupRestoreOplog=<nil>BackupRestoreDesiredTime=<nil>BackupRestoreSourceRsId=BackupRestoreFilterList=<nil>BackupRestoreFilteredFileListUrl=BackupRestoreJobId=BackupRestoreVerificationKey=BackupRestoreSourceGroupId=PitRestoreType=BackupThirdPartyOplogStoreType=EncryptionProviderType=KMIPProxyPort=0KMIPProxyDisabled=falseTemporaryPort=0CredentialsVersion=0Repair=nullRealtimeConfig=<nil>DataExplorerConfig=<nil>DefaultRWConcern=<nil>LdapCaPath=ConfigServers=[]RestartIntervalTimeMs=<nil>ClusterWideConfiguration=ProfilingConfig=<nil>RegionBaseUrl=RegionBaseRealtimeUrl=RegionBaseAgentUrl=StepDownPrimaryForResync=falsekey=<nil>keyLock=null. log destination=
[2024-04-22T07:22:36.561+0000] [.error] [src/action/start.go:sleepUntilProcessUp:267] <staging-mongodb-0> [07:22:36.561] Process staging-mongodb-0.staging-mongodb-svc.staging.svc.cluster.local:27017 (local=false) has not come up despite waiting for 30m0s
[2024-04-22T07:22:36.561+0000] [.error] [src/action/start.go:func1:145] [103] <staging-mongodb-0> [07:22:36.561] Error sleeping until process was up : <staging-mongodb-0> [07:22:36.561] Process staging-mongodb-0.staging-mongodb-svc.staging.svc.cluster.local:27017 (local=false) has not come up despite waiting for 30m0s
[2024-04-22T07:22:36.561+0000] [.error] [src/director/director.go:executePlan:988] <staging-mongodb-0> [07:22:36.561] Failed to apply action. Result = <nil> : <staging-mongodb-0> [07:22:36.561] Error sleeping until process was up : <staging-mongodb-0> [07:22:36.561] Process staging-mongodb-0.staging-mongodb-svc.staging.svc.cluster.local:27017 (local=false) has not come up despite waiting for 30m0s
[2024-04-22T07:22:36.561+0000] [.error] [src/director/director.go:planAndExecute:585] <staging-mongodb-0> [07:22:36.561] Plan execution failed on step StartFresh as part of move Start : <staging-mongodb-0> [07:22:36.561] Failed to apply action. Result = <nil> : <staging-mongodb-0> [07:22:36.561] Error sleeping until process was up : <staging-mongodb-0> [07:22:36.561] Process staging-mongodb-0.staging-mongodb-svc.staging.svc.cluster.local:27017 (local=false) has not come up despite waiting for 30m0s
[2024-04-22T07:22:36.561+0000] [.error] [src/director/director.go:mainLoop:394] <staging-mongodb-0> [07:22:36.561] Failed to planAndExecute : <staging-mongodb-0> [07:22:36.561] Plan execution failed on step StartFresh as part of move Start : <staging-mongodb-0> [07:22:36.561] Failed to apply action. Result = <nil> : <staging-mongodb-0> [07:22:36.561] Error sleeping until process was up : <staging-mongodb-0> [07:22:36.561] Process staging-mongodb-0.staging-mongodb-svc.staging.svc.cluster.local:27017 (local=false) has not come up despite waiting for 30m0s
[2024-04-22T07:54:17.873+0000] [.error] [src/action/start.go:sleepUntilProcessUp:267] <staging-mongodb-0> [07:54:17.873] Process staging-mongodb-0.staging-mongodb-svc.staging.svc.cluster.local:27017 (local=false) has not come up despite waiting for 30m0s
[2024-04-22T07:54:17.873+0000] [.error] [src/action/start.go:func1:145] [103] <staging-mongodb-0> [07:54:17.873] Error sleeping until process was up : <staging-mongodb-0> [07:54:17.873] Process staging-mongodb-0.staging-mongodb-svc.staging.svc.cluster.local:27017 (local=false) has not come up despite waiting for 30m0s
[2024-04-22T07:54:17.873+0000] [.error] [src/director/director.go:executePlan:988] <staging-mongodb-0> [07:54:17.873] Failed to apply action. Result = <nil> : <staging-mongodb-0> [07:54:17.873] Error sleeping until process was up : <staging-mongodb-0> [07:54:17.873] Process staging-mongodb-0.staging-mongodb-svc.staging.svc.cluster.local:27017 (local=false) has not come up despite waiting for 30m0s
[2024-04-22T07:54:17.873+0000] [.error] [src/director/director.go:planAndExecute:585] <staging-mongodb-0> [07:54:17.873] Plan execution failed on step StartFresh as part of move Start : <staging-mongodb-0> [07:54:17.873] Failed to apply action. Result = <nil> : <staging-mongodb-0> [07:54:17.873] Error sleeping until process was up : <staging-mongodb-0> [07:54:17.873] Process staging-mongodb-0.staging-mongodb-svc.staging.svc.cluster.local:27017 (local=false) has not come up despite waiting for 30m0s
[2024-04-22T07:54:17.873+0000] [.error] [src/director/director.go:mainLoop:394] <staging-mongodb-0> [07:54:17.873] Failed to planAndExecute : <staging-mongodb-0> [07:54:17.873] Plan execution failed on step StartFresh as part of move Start : <staging-mongodb-0> [07:54:17.873] Failed to apply action. Result = <nil> : <staging-mongodb-0> [07:54:17.873] Error sleeping until process was up : <staging-mongodb-0> [07:54:17.873] Process staging-mongodb-0.staging-mongodb-svc.staging.svc.cluster.local:27017 (local=false) has not come up despite waiting for 30m0s
Health logs:
(venv) prachiwaghulkar@Prachis-MacBook-Pro ~ % kubectl exec -it staging-mongodb-0 -c mongodb-agent -- cat /var/log/mongodb-mms-automation/healthstatus/agent-health-status.json
{"statuses":{"staging-mongodb-0":{"IsInGoalState":false,"LastMongoUpTime":0,"ExpectedToBeUp":true,"ReplicationStatus":-1}},"mmsStatus":{"staging-mongodb-0":{"name":"staging-mongodb-0","lastGoalVersionAchieved":-1,"plans":[{"automationConfigVersion":1,"started":"2024-04-22T06:50:56.381946486Z","completed":null,"moves":[{"move":"Start","moveDoc":"Start the process","steps":[{"step":"StartFresh","stepDoc":"Start a mongo instance (start fresh)","isWaitStep":false,"started":"2024-04-22T06:50:56.381976336Z","completed":null,"result":"error"}]},{"move":"WaitAllRsMembersUp","moveDoc":"Wait until all members of this process' repl set are up","steps":[{"step":"WaitAllRsMembersUp","stepDoc":"Wait until all members of this process' repl set are up","isWaitStep":true,"started":null,"completed":null,"result":""}]},{"move":"RsInit","moveDoc":"Initialize a replica set including the current MongoDB process","steps":[{"step":"RsInit","stepDoc":"Initialize a replica set","isWaitStep":false,"started":null,"completed":null,"result":""}]},{"move":"WaitFeatureCompatibilityVersionCorrect","moveDoc":"Wait for featureCompatibilityVersion to be right","steps":[{"step":"WaitFeatureCompatibilityVersionCorrect","stepDoc":"Wait for featureCompatibilityVersion to be right","isWaitStep":true,"started":null,"completed":null,"result":""}]}]}],"errorCode":0,"errorString":""}}}%
@nammn Were you able to check the issue?
FYI, these are the mongo-agent, readinessprobe image that I am using.
- image: mongodb/mongodb-agent
mediaType: application/vnd.docker.distribution.manifest.v2
digest: sha256:a208e80f79bb7fe954d9a9a1444bb482dee2e86e5e5ae89dbf240395c4a158b3
tag: 107.0.0.8465-1
platform:
architecture: amd64
os: linux
registries:
- host: quay.io
- image: mongodb/mongodb-kubernetes-operator-version-upgrade-post-start-hook
mediaType: application/vnd.docker.distribution.manifest.v2
digest: sha256:08495e1331a1691878e449d971129ed17858a20a7b69bb74d2e84f057cfcc098
tag: 1.0.8
platform:
architecture: amd64
os: linux
registries:
- host: quay.io
- image: mongodb/mongodb-kubernetes-operator
mediaType: application/vnd.docker.distribution.manifest.v2
digest: sha256:0aa26010be99caaf8a7dfd9cba81e326261ed99a69ac68b54aa8af3a104970bc
tag: 0.9.0
platform:
architecture: amd64
os: linux
registries:
- host: quay.io
- image: mongodb/mongodb-kubernetes-readinessprobe
mediaType: application/vnd.docker.distribution.manifest.v2
digest: sha256:e84438c5394be7223de27478eb9066204d62e6ecd233d3d4e4c11d3da486a7b5
tag: 1.0.17
platform:
architecture: amd64
os: linux
registries:
- host: quay.io
Having the exact same issue here. Fresh new instance.
apiVersion: mongodbcommunity.mongodb.com/v1
kind: MongoDBCommunity
metadata:
creationTimestamp: "2024-05-13T09:11:50Z"
generation: 1
name: wildduck-wildduck-mongo
namespace: solidite-mail
resourceVersion: "192255239"
uid: 7cfab3a7-30ac-434a-b65d-31b638229bde
spec:
additionalMongodConfig:
storage.wiredTiger.engineConfig.cacheSizeGB: 1
members: 1
security:
authentication:
ignoreUnknownUsers: true
modes:
- SCRAM
statefulSet:
spec:
template:
metadata:
annotations:
k8up.io/backupcommand: sh -c 'mongodump --username=$MONGODB_USER --password=$MONGODB_PASSWORD
mongodb://localhost/$MONGODB_NAME --archive'
k8up.io/file-extension: .archive
spec:
containers:
- env:
- name: MONGODB_NAME
value: wildduck
- name: MONGODB_USER
value: wildduck
- name: MONGODB_PASSWORD
valueFrom:
secretKeyRef:
key: password
name: wildduck-wildduck-mongo
imagePullPolicy: IfNotPresent
name: mongod
resources:
limits:
cpu: "1"
memory: 1100M
requests:
cpu: "0.3"
memory: 400M
type: ReplicaSet
users:
- db: wildduck
name: wildduck
passwordSecretRef:
name: wildduck-wildduck-mongo
roles:
- db: wildduck
name: readWrite
scramCredentialsSecretName: wildduck-wildduck-mongo-scram
version: 6.0.13
status:
currentMongoDBMembers: 0
currentStatefulSetReplicas: 0
message: ReplicaSet is not yet ready, retrying in 10 seconds
mongoUri: ""
phase: Pending
Describing the pod show the following errors :
Warning Unhealthy 21m (x3 over 21m) kubelet Readiness probe failed: panic: open /var/log/mongodb-mms-automation/healthstatus/agent-health-status.json: no such file or directory
goroutine 1 [running]:
main.main()
/workspace/cmd/readiness/main.go:226 +0x191
Warning BackOff 11m (x46 over 21m) kubelet Back-off restarting failed container mongod in pod wildduck-wildduck-mongo-0_solidite-mail(ba2270f0-ecf0-4468-b01f-b7a5df538b4b)
Warning Unhealthy 76s (x223 over 21m) kubelet Readiness probe failed:
The pod logs contains nothing revelent
@prachiwaghulkar can you verify that the mongodb image you are using is indeed compatible and working? Looking at the agent log it seems that the agent seems to wait forever and mongod and the related service is not up and running.;
can you somehow get a debug container running trying to access that service? I
staging-mongodb-0.staging-mongodb-svc.staging.svc.cluster.local:27017
Facing same issue mongodb instance readiness probe failing for mongodb-agent container
MongoDB Community operator Version: community-operator-0.9.0
Openshift version: 4.14.25
apiVersion: mongodbcommunity.mongodb.com/v1
kind: MongoDBCommunity
metadata:
name: mongodb-devops-test
namespace: di-devops
spec:
additionalConnectionStringConfig:
readPreference: primary
additionalMongodConfig:
storage.wiredTiger.engineConfig.journalCompressor: zlib
members: 3
security:
authentication:
ignoreUnknownUsers: true
modes:
- SCRAM
statefulSet:
spec:
selector:
matchLabels:
app.kubernetes.io/name: mongodb
template:
metadata:
labels:
app.kubernetes.io/name: mongodb
spec:
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- podAffinityTerm:
labelSelector:
matchExpressions:
- key: app.kubernetes.io/name
operator: In
values:
- mongodb
topologyKey: kubernetes.io/hostname
weight: 100
containers:
- name: mongod
resources:
limits:
cpu: '0.2'
memory: 250M
requests:
cpu: '0.2'
memory: 200M
- name: mongodb-agent
readinessProbe:
failureThreshold: 40
initialDelaySeconds: 5
timeout: 30
resources:
limits:
cpu: '0.2'
memory: 250M
requests:
cpu: '0.2'
memory: 200M
initContainers:
- name: mongodb-agent-readinessprobe
resources:
limits:
cpu: '2'
memory: 200M
requests:
cpu: '1'
memory: 100M
type: ReplicaSet
users:
- additionalConnectionStringConfig:
readPreference: secondary
db: didevops
name: didevops
passwordSecretRef:
name: my-user-password
roles:
- db: didevops
name: clusterAdmin
- db: didevops
name: userAdminAnyDatabase
- db: didveops
name: readWriteAnyDatabase
scramCredentialsSecretName: my-scram
version: 6.0.5
status:
currentMongoDBMembers: 3
currentStatefulSetReplicas: 3
message: 'ReplicaSet is not yet ready, retrying in 10 seconds'
mongoUri: 'mongodb://mongodb-devops-test-0.mongodb-devops-test-svc.di-devops.svc.cluster.local:27017,mongodb-devops-test-1.mongodb-devops-test-svc.di-devops.svc.cluster.local:27017,mongodb-devops-test-2.mongodb-devops-test-svc.di-devops.svc.cluster.local:27017/?replicaSet=mongodb-devops-test&readPreference=primary'
phase: Pending
version: 6.0.5
Ensure that your node has correct CPU model available. Mongo required AVX support. I didn't expose the CPU flag nor used the host CPU model passtrough, causing Mongo to not start.
Ensure that your node has correct CPU model available. Mongo required AVX support. I didn't expose the CPU flag nor used the host CPU model passtrough, causing Mongo to not start.
How can I ensure that node has correct CPU model available in openshift pod, is there any docs available or command which can help it supports?