gravitational/gravity

Upgrade from 6.1.33 to 7.0.36 fails on missing etcd.bak file when using custom install directories

ulysseskan opened this issue · 1 comments

Description

Similar to #2006 , upgrading 6.1.33 to 7.0.36, if 6.1.33 was installed with a custom state-dir, fails with this error:
[ERROR]: open /opt/app/gravity/site/update/etcd.bak: no such file or directory

This is a regression from the fix in issue 2006. (Even though our CI tests upgrades with a custom state dir.)

What happened:
Upgrade fails as described above

What you expected to happen:
Upgrade proceeds

How to reproduce it (as minimally and precisely as possible):

  1. sudo ./gravity install --cloud-provider=generic --mount=data:/opt/app --state-dir=/opt/app/gravity (install 6.1.33)
  2. sudo ./upload (upload new 7.0.36 cluster image)
  3. sudo ./gravity upgrade

Environment

  • Gravity version [e.g. 7.0.11]: 6.1.33 to 7.0.36
  • OS [e.g. Redhat 7.4]: Ubuntu Bionic
  • Platform [e.g. Vmware, AWS]: Local

Relevant Debug Logs If Applicable

2022-04-26T01:20:33Z WARN [UPDATE]    Failed to execute plan. error:[
ERROR REPORT:
Original Error: *exec.ExitError exit status 255
Fields:
  output: [ERROR]: open /opt/app/gravity/site/update/etcd.bak: no such file or directory

Stack Trace:
        /gopath/src/github.com/gravitational/gravity/lib/utils/exec.go:137 github.com/gravitational/gravity/lib/utils.RunStream
        /gopath/src/github.com/gravitational/gravity/lib/utils/exec.go:98 github.com/gravitational/gravity/lib/utils.RunCommand
        /gopath/src/github.com/gravitational/gravity/lib/utils/exec.go:83 github.com/gravitational/gravity/lib/utils.RunPlanetCommand
        /gopath/src/github.com/gravitational/gravity/lib/update/cluster/phases/etcd.go:78 github.com/gravitational/gravity/lib/update/cluster/phases.(*PhaseUpgradeEtcdBackup).Execute
        /gopath/src/github.com/gravitational/gravity/lib/fsm/fsm.go:512 github.com/gravitational/gravity/lib/fsm.(*FSM).executeOnePhase
        /gopath/src/github.com/gravitational/gravity/lib/fsm/fsm.go:444 github.com/gravitational/gravity/lib/fsm.(*FSM).executePhaseLocally
        /gopath/src/github.com/gravitational/gravity/lib/fsm/fsm.go:404 github.com/gravitational/gravity/lib/fsm.(*FSM).executePhase
        /gopath/src/github.com/gravitational/gravity/lib/fsm/fsm.go:246 github.com/gravitational/gravity/lib/fsm.(*FSM).ExecutePhase
        /gopath/src/github.com/gravitational/gravity/lib/fsm/fsm.go:455 github.com/gravitational/gravity/lib/fsm.(*FSM).executeSubphasesSequentially
        /gopath/src/github.com/gravitational/gravity/lib/fsm/fsm.go:449 github.com/gravitational/gravity/lib/fsm.(*FSM).executePhaseLocally
        /gopath/src/github.com/gravitational/gravity/lib/fsm/fsm.go:376 github.com/gravitational/gravity/lib/fsm.(*FSM).executePhase
        /gopath/src/github.com/gravitational/gravity/lib/fsm/fsm.go:246 github.com/gravitational/gravity/lib/fsm.(*FSM).ExecutePhase
        /gopath/src/github.com/gravitational/gravity/lib/fsm/fsm.go:455 github.com/gravitational/gravity/lib/fsm.(*FSM).executeSubphasesSequentially
        /gopath/src/github.com/gravitational/gravity/lib/fsm/fsm.go:449 github.com/gravitational/gravity/lib/fsm.(*FSM).executePhaseLocally
        /gopath/src/github.com/gravitational/gravity/lib/fsm/fsm.go:376 github.com/gravitational/gravity/lib/fsm.(*FSM).executePhase
        /gopath/src/github.com/gravitational/gravity/lib/fsm/fsm.go:246 github.com/gravitational/gravity/lib/fsm.(*FSM).ExecutePhase
        /gopath/src/github.com/gravitational/gravity/lib/fsm/fsm.go:175 github.com/gravitational/gravity/lib/fsm.(*FSM).ExecutePlan
        /gopath/src/github.com/gravitational/gravity/lib/update/updater.go:216 github.com/gravitational/gravity/lib/update.(*Updater).executePlan
        /gopath/src/github.com/gravitational/gravity/lib/update/updater.go:61 github.com/gravitational/gravity/lib/update.(*Updater).Run.func1
        /go/src/runtime/asm_amd64.s:1581 runtime.goexit
User Message: failed to execute phase "/etcd"
        failed to backup etcd
                exit status 255] operation:operation(update(431e6db1-650a-4a8f-ad18-d02cec71c936), cluster=agitatedlewin3627, state=update_in_progress, created=Tue Apr 26 01:17 UTC) utils/logging.go:103

I think the fix in #2010 was never ported outside of 6.1.x.

In 6.1.x we have:

func backupFile() (path string) {
return filepath.Join(state.GravityUpdateDir(defaults.GravityDir), defaults.EtcdUpgradeBackupFile)
}

In 7.0.x we have:

func backupFile() (string, error) {
stateDir, err := state.GetStateDir()
if err != nil {
return "", trace.Wrap(err)
}
return filepath.Join(state.GravityUpdateDir(stateDir), defaults.EtcdUpgradeBackupFile), nil
}

I don't exactly understand how the change fixes the issue, but upgrade seems to complete with the change 6.1.33 -> 7.0.38-dev.

$ sudo ./gravity plan
Phase                              Description                                                               State         Node            Requires                                             Updated
-----                              -----------                                                               -----         ----            --------                                             -------
✓ init                             Initialize update operation                                               Completed     -               -                                                    Tue Apr 26 23:02 UTC
  ✓ robotest-f168927b-node-0       Initialize node "robotest-f168927b-node-0"                                Completed     10.138.0.66     -                                                    Tue Apr 26 23:02 UTC
✓ checks                           Run preflight checks                                                      Completed     -               /init                                                Tue Apr 26 23:02 UTC
✓ pre-update                       Run pre-update application hook                                           Completed     -               /init,/checks                                        Tue Apr 26 23:02 UTC
✓ bootstrap                        Bootstrap update operation on nodes                                       Completed     -               /checks,/pre-update                                  Tue Apr 26 23:02 UTC
  ✓ robotest-f168927b-node-0       Bootstrap node "robotest-f168927b-node-0"                                 Completed     10.138.0.66     -                                                    Tue Apr 26 23:02 UTC
✓ coredns                          Provision CoreDNS resources                                               Completed     -               /bootstrap                                           Tue Apr 26 23:02 UTC
✓ masters                          Update master nodes                                                       Completed     -               /coredns                                             Tue Apr 26 23:04 UTC
  ✓ robotest-f168927b-node-0       Update system software on master node "robotest-f168927b-node-0"          Completed     -               -                                                    Tue Apr 26 23:04 UTC
    ✓ drain                        Drain node "robotest-f168927b-node-0"                                     Completed     10.138.0.66     -                                                    Tue Apr 26 23:02 UTC
    ✓ system-upgrade               Update system software on node "robotest-f168927b-node-0"                 Completed     10.138.0.66     /masters/robotest-f168927b-node-0/drain              Tue Apr 26 23:03 UTC
    ✓ health                       Health check node "robotest-f168927b-node-0"                              Completed     -               /masters/robotest-f168927b-node-0/system-upgrade     Tue Apr 26 23:03 UTC
    ✓ taint                        Taint node "robotest-f168927b-node-0"                                     Completed     10.138.0.66     /masters/robotest-f168927b-node-0/health             Tue Apr 26 23:03 UTC
    ✓ uncordon                     Uncordon node "robotest-f168927b-node-0"                                  Completed     10.138.0.66     /masters/robotest-f168927b-node-0/taint              Tue Apr 26 23:03 UTC
    ✓ endpoints                    Wait for DNS/cluster endpoints on "robotest-f168927b-node-0"              Completed     10.138.0.66     /masters/robotest-f168927b-node-0/uncordon           Tue Apr 26 23:03 UTC
    ✓ untaint                      Remove taint from node "robotest-f168927b-node-0"                         Completed     10.138.0.66     /masters/robotest-f168927b-node-0/endpoints          Tue Apr 26 23:04 UTC
✓ etcd                             Upgrade etcd 3.3.22 to 3.4.9                                              Completed     -               -                                                    Tue Apr 26 23:05 UTC
  ✓ backup                         Backup etcd data                                                          Completed     -               -                                                    Tue Apr 26 23:04 UTC
    ✓ robotest-f168927b-node-0     Backup etcd on node "robotest-f168927b-node-0"                            Completed     -               -                                                    Tue Apr 26 23:04 UTC
  ✓ shutdown                       Shutdown etcd cluster                                                     Completed     -               -                                                    Tue Apr 26 23:04 UTC
    ✓ robotest-f168927b-node-0     Shutdown etcd on node "robotest-f168927b-node-0"                          Completed     -               /etcd/backup/robotest-f168927b-node-0                Tue Apr 26 23:04 UTC
  ✓ upgrade                        Upgrade etcd servers                                                      Completed     -               -                                                    Tue Apr 26 23:04 UTC
    ✓ robotest-f168927b-node-0     Upgrade etcd on node "robotest-f168927b-node-0"                           Completed     -               /etcd/shutdown/robotest-f168927b-node-0              Tue Apr 26 23:04 UTC
  ✓ migrate                        Migrate etcd data to new version                                          Completed     -               -                                                    Tue Apr 26 23:04 UTC
    ✓ robotest-f168927b-node-0     Migrate etcd data to version 3.4.9 on node "robotest-f168927b-node-0"     Completed     -               /etcd/upgrade/robotest-f168927b-node-0               Tue Apr 26 23:04 UTC
  ✓ restart                        Restart etcd servers                                                      Completed     -               -                                                    Tue Apr 26 23:05 UTC
    ✓ robotest-f168927b-node-0     Restart etcd on node "robotest-f168927b-node-0"                           Completed     -               /etcd/migrate/robotest-f168927b-node-0               Tue Apr 26 23:04 UTC
    ✓ gravity-site                 Restart gravity-site service                                              Completed     -               -                                                    Tue Apr 26 23:05 UTC
✓ config                           Update system configuration on nodes                                      Completed     -               /etcd                                                Tue Apr 26 23:05 UTC
  ✓ robotest-f168927b-node-0       Update system configuration on node "robotest-f168927b-node-0"            Completed     -               -                                                    Tue Apr 26 23:05 UTC
✓ openebs                          Create OpenEBS configuration                                              Completed     10.138.0.66     /config                                              Tue Apr 26 23:05 UTC
✓ runtime                          Update application runtime                                                Completed     -               /openebs                                             Tue Apr 26 23:12 UTC
  ✓ rbac-app                       Update system application "rbac-app" to 7.0.38-dev.2                      Completed     -               -                                                    Tue Apr 26 23:05 UTC
  ✓ dns-app                        Update system application "dns-app" to 7.0.4                              Completed     -               /runtime/rbac-app                                    Tue Apr 26 23:06 UTC
  ✓ storage-app                    Update system application "storage-app" to 0.0.3                          Completed     -               /runtime/dns-app                                     Tue Apr 26 23:08 UTC
  ✓ logging-app                    Update system application "logging-app" to 7.0.1                          Completed     -               /runtime/storage-app                                 Tue Apr 26 23:09 UTC
  ✓ monitoring-app                 Update system application "monitoring-app" to 7.0.12                      Completed     -               /runtime/logging-app                                 Tue Apr 26 23:11 UTC
  ✓ tiller-app                     Update system application "tiller-app" to 7.0.2                           Completed     -               /runtime/monitoring-app                              Tue Apr 26 23:11 UTC
  ✓ site                           Update system application "site" to 7.0.38-dev.2                          Completed     -               /runtime/tiller-app                                  Tue Apr 26 23:12 UTC
  ✓ kubernetes                     Update system application "kubernetes" to 7.0.38-dev.2                    Completed     -               /runtime/site                                        Tue Apr 26 23:12 UTC
✓ migration                        Perform system database migration                                         Completed     -               /runtime                                             Tue Apr 26 23:12 UTC
  ✓ labels                         Update node labels                                                        Completed     -               -                                                    Tue Apr 26 23:12 UTC
✓ app                              Update installed application                                              Completed     -               /migration                                           Tue Apr 26 23:12 UTC
  ✓ telekube                       Update application "telekube" to 7.0.38-dev.2                             Completed     -               -                                                    Tue Apr 26 23:12 UTC
✓ gc                               Run cleanup tasks                                                         Completed     -               /app                                                 Tue Apr 26 23:12 UTC
  ✓ robotest-f168927b-node-0       Clean up node "robotest-f168927b-node-0"                                  Completed     -               -                                                    Tue Apr 26 23:12 UTC