canonical/zookeeper-k8s-operator

Upgrade from rev41 not working

Closed this issue · 1 comments

I'm facing some issues when upgrading the charm. I'm currently using revision 41, deployed using the following bundle:

bundle: kubernetes
name: kafka-k8s-bundle
applications:
  tls:
    charm: self-signed-certificates
    channel: latest/edge
    revision: 75
    scale: 1
    options:
      ca-common-name: Canonical
    constraints: arch=amd64
  zookeeper-k8s:
    charm: zookeeper-k8s
    channel: 3/edge
    revision: 41
    scale: 3
    trust: true
    constraints: arch=amd64
    resources:
      zookeeper-image: 28
relations:
- - zookeeper-k8s:certificates
  - tls:certificates

I have identified some strange hints/issues:

  1. The pre-upgrade-check action fails, although the logs does not show this, and there are only INFO logs
  2. The peer-relation upgrade databag does not have any upgrade stack, although logs are also saying Building upgrade stack for VM (also the logging here may be improved since we are not in vm)
  3. If I try to do juju refresh, the upgrade of the first unit (zookeeper-k8s/2) does not successfully go through, and the state of the unit reports: zookeeper service is unreachable or not serving requests, therefore basically halting the upgrade process

Steps to reproduce and Actual behavior

  1. juju deploy ./bundle.yaml --trust

(wait for units to come up healthy)

  1. juju run zookeeper-k8s/leader pre-upgrade-check --format yaml

The action provides the following output:

Running operation 1 with 1 task
  - task 2 on unit-zookeeper-k8s-0

Waiting for task 2...
zookeeper-k8s/0:
  id: "2"
  message: Unknown error found.
  results:
    return-code: 0
  status: failed
  timing:
    completed: 2024-02-14 23:29:05 +0000 UTC
    enqueued: 2024-02-14 23:29:03 +0000 UTC
    started: 2024-02-14 23:29:03 +0000 UTC
  unit: zookeeper-k8s/0

Also the upgrade peer relation databag does not show any upgrade stack

  1. If I try to upgrade anyway, with juju refresh zookeeper-k8s effectively bumping to rev45, the upgrade of the first unit goes into error with the following state (of juju status)
Model  Controller  Cloud/Region        Version  SLA          Timestamp
tests  micro       microk8s/localhost  3.1.7    unsupported  23:43:54Z

App            Version  Status   Scale  Charm                     Channel      Rev  Address         Exposed  Message
tls                     active       1  self-signed-certificates  latest/edge   75  10.152.183.111  no
zookeeper-k8s           waiting      3  zookeeper-k8s             3/edge        45  10.152.183.126  no       installing agent

Unit              Workload  Agent  Address      Ports  Message
tls/0*            active    idle   10.1.63.214
zookeeper-k8s/0*  active    idle   10.1.63.203
zookeeper-k8s/1   active    idle   10.1.63.202
zookeeper-k8s/2   blocked   idle   10.1.63.207         zookeeper service is unreachable or not serving requests

Expected behavior

Upgrade goes though cleanly .

Versions

Operating system: Ubuntu 22.04 LTS

Juju CLI: 3.1.7

Juju agent: 3.1.7

Charm revision: 41 upgrade to 50

microk8s: 1.29-strict/stable

installed:               v1.29.0             (6370) 168MB -

Log output

Juju debug log (starting from the pre-upgrade-check action and on)

logs.txt