Upgrade and pod rescheduling failing with TLS
Closed this issue · 1 comments
deusebio commented
During upgrade tests, we noticed that tls-related files are not correctly created during pod rescheduling, therefore not allowing cluster to recover.
Steps to reproduce
- build from
5d9bb11e61174f4680ff9effeb56c7be18b03c18
juju deploy ./zookeeper-k8s_ubuntu-22.04-amd64.charm -n 3 --trust --resource zookeeper-image=ghcr.io/canonical/charmed-zookeeper@sha256:dbdbd8367bf6d813b9aae1e15a6c1743f909db7555a47995b6b5d259e87f2af1
juju deploy self-signed-certificates
juju relate zookeeper-k8s self-signed-certificates
juju run zookeeper-k8s/leader pre-upgrade-check --format yaml
Running operation 1 with 1 task
- task 2 on unit-zookeeper-k8s-0
Waiting for task 2...
zookeeper-k8s/0:
id: "2"
results:
return-code: 0
status: completed
timing:
completed: 2024-02-16 11:52:50 +0000 UTC
enqueued: 2024-02-16 11:52:48 +0000 UTC
started: 2024-02-16 11:52:48 +0000 UTC
unit: zookeeper-k8s/0
juju refresh zookeeper-k8s --path ./zookeeper-k8s_ubuntu-22.04-amd64.charm
Expected behavior
Upgrades works file and the units recovers from pod rescheduling
Actual behavior
Juju status at the end:
...
Unit Workload Agent Address Ports Message
self-signed-certificates/0* active idle 10.1.63.209
zookeeper-k8s/0* active idle 10.1.63.227
zookeeper-k8s/1 active idle 10.1.63.232
zookeeper-k8s/2 blocked idle 10.1.63.231
...
Juju debug-log:
unit-zookeeper-k8s-2: 11:56:14 INFO unit.zookeeper-k8s/2.juju-log Running legacy hooks/upgrade-charm.
unit-zookeeper-k8s-2: 11:56:16 INFO unit.zookeeper-k8s/2.juju-log zookeeper-k8s/2 initializing...
unit-zookeeper-k8s-2: 11:56:17 INFO unit.zookeeper-k8s/2.juju-log zookeeper-k8s/2 started
unit-self-signed-certificates-0: 11:56:24 INFO juju.worker.uniter.operation ran "update-status" hook (via hook dispatching script: dispatch)
unit-zookeeper-k8s-1: 11:56:41 INFO juju.worker.uniter.operation ran "update-status" hook (via hook dispatching script: dispatch)
unit-zookeeper-k8s-2: 11:58:00 ERROR unit.zookeeper-k8s/2.juju-log Not all application units are connected and broadcasting in the quorum
unit-zookeeper-k8s-2: 11:58:00 CRITICAL unit.zookeeper-k8s/2.juju-log Unit failed to upgrade and requires manual rollback to previous stable version.
1. Re-run `pre-upgrade-check` action on the leader unit to enter 'recovery' state
2. Run `juju refresh` to the previously deployed charm revision
unit-zookeeper-k8s-2: 11:58:00 INFO juju.worker.uniter.operation ran "upgrade-charm" hook (via hook dispatching script: dispatch)
unit-zookeeper-k8s-2: 11:58:00 INFO juju.worker.uniter found queued "config-changed" hook
Zookeeper logs show:
2024-02-16T12:07:59.538Z [zookeeper] 12:07:59.538 [QuorumConnectionThread-[myid=3]-25] DEBUG org.apache.zookeeper.server.quorum.QuorumCnxManager - Opening channel to server 2
2024-02-16T12:07:59.538Z [zookeeper] 12:07:59.538 [QuorumConnectionThread-[myid=3]-25] WARN org.apache.zookeeper.server.quorum.QuorumCnxManager - Cannot open secure channel to 2 at election address zookeeper-k8s-1.zookeeper-k8s-endpoints/10.1.63.232:3888
2024-02-16T12:07:59.538Z [zookeeper] org.apache.zookeeper.common.X509Exception$SSLContextException: Failed to create KeyManager
2024-02-16T12:07:59.538Z [zookeeper] at org.apache.zookeeper.common.X509Util.createSSLContextAndOptionsFromConfig(X509Util.java:371)
2024-02-16T12:07:59.538Z [zookeeper] at org.apache.zookeeper.common.X509Util.createSSLContextAndOptions(X509Util.java:349)
2024-02-16T12:07:59.538Z [zookeeper] at org.apache.zookeeper.common.X509Util.createSSLContextAndOptions(X509Util.java:303)
2024-02-16T12:07:59.538Z [zookeeper] at org.apache.zookeeper.common.X509Util.getDefaultSSLContextAndOptions(X509Util.java:283)
2024-02-16T12:07:59.538Z [zookeeper] at org.apache.zookeeper.common.X509Util.createSSLSocket(X509Util.java:574)
2024-02-16T12:07:59.538Z [zookeeper] at org.apache.zookeeper.server.quorum.QuorumCnxManager.initiateConnection(QuorumCnxManager.java:379)
2024-02-16T12:07:59.538Z [zookeeper] at org.apache.zookeeper.server.quorum.QuorumCnxManager$QuorumConnectionReqThread.run(QuorumCnxManager.java:458)
2024-02-16T12:07:59.538Z [zookeeper] at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
2024-02-16T12:07:59.538Z [zookeeper] at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
2024-02-16T12:07:59.538Z [zookeeper] at java.base/java.lang.Thread.run(Thread.java:833)
2024-02-16T12:07:59.538Z [zookeeper] Caused by: org.apache.zookeeper.common.X509Exception$KeyManagerException: java.io.FileNotFoundException: /etc/zookeeper/keystore.p12 (No such file or directory)
2024-02-16T12:07:59.538Z [zookeeper] at org.apache.zookeeper.common.X509Util.createKeyManager(X509Util.java:492)
2024-02-16T12:07:59.538Z [zookeeper] at org.apache.zookeeper.common.X509Util.createSSLContextAndOptionsFromConfig(X509Util.java:369)
2024-02-16T12:07:59.538Z [zookeeper] ... 9 common frames omitted
2024-02-16T12:07:59.538Z [zookeeper] Caused by: java.io.FileNotFoundException: /etc/zookeeper/keystore.p12 (No such file or directory)
2024-02-16T12:07:59.538Z [zookeeper] at java.base/java.io.FileInputStream.open0(Native Method)
2024-02-16T12:07:59.538Z [zookeeper] at java.base/java.io.FileInputStream.open(FileInputStream.java:216)
2024-02-16T12:07:59.538Z [zookeeper] at java.base/java.io.FileInputStream.<init>(FileInputStream.java:157)
2024-02-16T12:07:59.538Z [zookeeper] at org.apache.zookeeper.common.StandardTypeFileKeyStoreLoader.loadKeyStore(StandardTypeFileKeyStoreLoader.java:53)
2024-02-16T12:07:59.538Z [zookeeper] at org.apache.zookeeper.common.X509Util.loadKeyStore(X509Util.java:425)
2024-02-16T12:07:59.538Z [zookeeper] at org.apache.zookeeper.common.X509Util.createKeyManager(X509Util.java:481)
2024-02-16T12:07:59.538Z [zookeeper] ... 10 common frames omitted
root@zookeeper-k8s-2:/# cd /etc/zookeeper/
Versions
Operating system: Ubuntu 22.04 LTS
Juju CLI: 3.1.7
Juju agent: 3.1.7
Charm revision: 41 upgrade to 50
microk8s: 1.29-strict/stable
installed: v1.29.0 (6370) 168MB -