Upgrade from 6.1.33 to 7.0.36 fails on missing etcd.bak file when using custom install directories
ulysseskan opened this issue · 1 comments
Description
Similar to #2006 , upgrading 6.1.33 to 7.0.36, if 6.1.33 was installed with a custom state-dir, fails with this error:
[ERROR]: open /opt/app/gravity/site/update/etcd.bak: no such file or directory
This is a regression from the fix in issue 2006. (Even though our CI tests upgrades with a custom state dir.)
What happened:
Upgrade fails as described above
What you expected to happen:
Upgrade proceeds
How to reproduce it (as minimally and precisely as possible):
- sudo ./gravity install --cloud-provider=generic --mount=data:/opt/app --state-dir=/opt/app/gravity (install 6.1.33)
- sudo ./upload (upload new 7.0.36 cluster image)
- sudo ./gravity upgrade
Environment
- Gravity version [e.g. 7.0.11]: 6.1.33 to 7.0.36
- OS [e.g. Redhat 7.4]: Ubuntu Bionic
- Platform [e.g. Vmware, AWS]: Local
Relevant Debug Logs If Applicable
2022-04-26T01:20:33Z WARN [UPDATE] Failed to execute plan. error:[
ERROR REPORT:
Original Error: *exec.ExitError exit status 255
Fields:
output: [ERROR]: open /opt/app/gravity/site/update/etcd.bak: no such file or directory
Stack Trace:
/gopath/src/github.com/gravitational/gravity/lib/utils/exec.go:137 github.com/gravitational/gravity/lib/utils.RunStream
/gopath/src/github.com/gravitational/gravity/lib/utils/exec.go:98 github.com/gravitational/gravity/lib/utils.RunCommand
/gopath/src/github.com/gravitational/gravity/lib/utils/exec.go:83 github.com/gravitational/gravity/lib/utils.RunPlanetCommand
/gopath/src/github.com/gravitational/gravity/lib/update/cluster/phases/etcd.go:78 github.com/gravitational/gravity/lib/update/cluster/phases.(*PhaseUpgradeEtcdBackup).Execute
/gopath/src/github.com/gravitational/gravity/lib/fsm/fsm.go:512 github.com/gravitational/gravity/lib/fsm.(*FSM).executeOnePhase
/gopath/src/github.com/gravitational/gravity/lib/fsm/fsm.go:444 github.com/gravitational/gravity/lib/fsm.(*FSM).executePhaseLocally
/gopath/src/github.com/gravitational/gravity/lib/fsm/fsm.go:404 github.com/gravitational/gravity/lib/fsm.(*FSM).executePhase
/gopath/src/github.com/gravitational/gravity/lib/fsm/fsm.go:246 github.com/gravitational/gravity/lib/fsm.(*FSM).ExecutePhase
/gopath/src/github.com/gravitational/gravity/lib/fsm/fsm.go:455 github.com/gravitational/gravity/lib/fsm.(*FSM).executeSubphasesSequentially
/gopath/src/github.com/gravitational/gravity/lib/fsm/fsm.go:449 github.com/gravitational/gravity/lib/fsm.(*FSM).executePhaseLocally
/gopath/src/github.com/gravitational/gravity/lib/fsm/fsm.go:376 github.com/gravitational/gravity/lib/fsm.(*FSM).executePhase
/gopath/src/github.com/gravitational/gravity/lib/fsm/fsm.go:246 github.com/gravitational/gravity/lib/fsm.(*FSM).ExecutePhase
/gopath/src/github.com/gravitational/gravity/lib/fsm/fsm.go:455 github.com/gravitational/gravity/lib/fsm.(*FSM).executeSubphasesSequentially
/gopath/src/github.com/gravitational/gravity/lib/fsm/fsm.go:449 github.com/gravitational/gravity/lib/fsm.(*FSM).executePhaseLocally
/gopath/src/github.com/gravitational/gravity/lib/fsm/fsm.go:376 github.com/gravitational/gravity/lib/fsm.(*FSM).executePhase
/gopath/src/github.com/gravitational/gravity/lib/fsm/fsm.go:246 github.com/gravitational/gravity/lib/fsm.(*FSM).ExecutePhase
/gopath/src/github.com/gravitational/gravity/lib/fsm/fsm.go:175 github.com/gravitational/gravity/lib/fsm.(*FSM).ExecutePlan
/gopath/src/github.com/gravitational/gravity/lib/update/updater.go:216 github.com/gravitational/gravity/lib/update.(*Updater).executePlan
/gopath/src/github.com/gravitational/gravity/lib/update/updater.go:61 github.com/gravitational/gravity/lib/update.(*Updater).Run.func1
/go/src/runtime/asm_amd64.s:1581 runtime.goexit
User Message: failed to execute phase "/etcd"
failed to backup etcd
exit status 255] operation:operation(update(431e6db1-650a-4a8f-ad18-d02cec71c936), cluster=agitatedlewin3627, state=update_in_progress, created=Tue Apr 26 01:17 UTC) utils/logging.go:103
I think the fix in #2010 was never ported outside of 6.1.x.
In 6.1.x we have:
gravity/lib/update/cluster/phases/etcd.go
Lines 76 to 78 in 78e69a1
In 7.0.x we have:
gravity/lib/update/cluster/phases/etcd.go
Lines 334 to 340 in 449ffca
I don't exactly understand how the change fixes the issue, but upgrade seems to complete with the change 6.1.33
-> 7.0.38-dev
.
$ sudo ./gravity plan
Phase Description State Node Requires Updated
----- ----------- ----- ---- -------- -------
✓ init Initialize update operation Completed - - Tue Apr 26 23:02 UTC
✓ robotest-f168927b-node-0 Initialize node "robotest-f168927b-node-0" Completed 10.138.0.66 - Tue Apr 26 23:02 UTC
✓ checks Run preflight checks Completed - /init Tue Apr 26 23:02 UTC
✓ pre-update Run pre-update application hook Completed - /init,/checks Tue Apr 26 23:02 UTC
✓ bootstrap Bootstrap update operation on nodes Completed - /checks,/pre-update Tue Apr 26 23:02 UTC
✓ robotest-f168927b-node-0 Bootstrap node "robotest-f168927b-node-0" Completed 10.138.0.66 - Tue Apr 26 23:02 UTC
✓ coredns Provision CoreDNS resources Completed - /bootstrap Tue Apr 26 23:02 UTC
✓ masters Update master nodes Completed - /coredns Tue Apr 26 23:04 UTC
✓ robotest-f168927b-node-0 Update system software on master node "robotest-f168927b-node-0" Completed - - Tue Apr 26 23:04 UTC
✓ drain Drain node "robotest-f168927b-node-0" Completed 10.138.0.66 - Tue Apr 26 23:02 UTC
✓ system-upgrade Update system software on node "robotest-f168927b-node-0" Completed 10.138.0.66 /masters/robotest-f168927b-node-0/drain Tue Apr 26 23:03 UTC
✓ health Health check node "robotest-f168927b-node-0" Completed - /masters/robotest-f168927b-node-0/system-upgrade Tue Apr 26 23:03 UTC
✓ taint Taint node "robotest-f168927b-node-0" Completed 10.138.0.66 /masters/robotest-f168927b-node-0/health Tue Apr 26 23:03 UTC
✓ uncordon Uncordon node "robotest-f168927b-node-0" Completed 10.138.0.66 /masters/robotest-f168927b-node-0/taint Tue Apr 26 23:03 UTC
✓ endpoints Wait for DNS/cluster endpoints on "robotest-f168927b-node-0" Completed 10.138.0.66 /masters/robotest-f168927b-node-0/uncordon Tue Apr 26 23:03 UTC
✓ untaint Remove taint from node "robotest-f168927b-node-0" Completed 10.138.0.66 /masters/robotest-f168927b-node-0/endpoints Tue Apr 26 23:04 UTC
✓ etcd Upgrade etcd 3.3.22 to 3.4.9 Completed - - Tue Apr 26 23:05 UTC
✓ backup Backup etcd data Completed - - Tue Apr 26 23:04 UTC
✓ robotest-f168927b-node-0 Backup etcd on node "robotest-f168927b-node-0" Completed - - Tue Apr 26 23:04 UTC
✓ shutdown Shutdown etcd cluster Completed - - Tue Apr 26 23:04 UTC
✓ robotest-f168927b-node-0 Shutdown etcd on node "robotest-f168927b-node-0" Completed - /etcd/backup/robotest-f168927b-node-0 Tue Apr 26 23:04 UTC
✓ upgrade Upgrade etcd servers Completed - - Tue Apr 26 23:04 UTC
✓ robotest-f168927b-node-0 Upgrade etcd on node "robotest-f168927b-node-0" Completed - /etcd/shutdown/robotest-f168927b-node-0 Tue Apr 26 23:04 UTC
✓ migrate Migrate etcd data to new version Completed - - Tue Apr 26 23:04 UTC
✓ robotest-f168927b-node-0 Migrate etcd data to version 3.4.9 on node "robotest-f168927b-node-0" Completed - /etcd/upgrade/robotest-f168927b-node-0 Tue Apr 26 23:04 UTC
✓ restart Restart etcd servers Completed - - Tue Apr 26 23:05 UTC
✓ robotest-f168927b-node-0 Restart etcd on node "robotest-f168927b-node-0" Completed - /etcd/migrate/robotest-f168927b-node-0 Tue Apr 26 23:04 UTC
✓ gravity-site Restart gravity-site service Completed - - Tue Apr 26 23:05 UTC
✓ config Update system configuration on nodes Completed - /etcd Tue Apr 26 23:05 UTC
✓ robotest-f168927b-node-0 Update system configuration on node "robotest-f168927b-node-0" Completed - - Tue Apr 26 23:05 UTC
✓ openebs Create OpenEBS configuration Completed 10.138.0.66 /config Tue Apr 26 23:05 UTC
✓ runtime Update application runtime Completed - /openebs Tue Apr 26 23:12 UTC
✓ rbac-app Update system application "rbac-app" to 7.0.38-dev.2 Completed - - Tue Apr 26 23:05 UTC
✓ dns-app Update system application "dns-app" to 7.0.4 Completed - /runtime/rbac-app Tue Apr 26 23:06 UTC
✓ storage-app Update system application "storage-app" to 0.0.3 Completed - /runtime/dns-app Tue Apr 26 23:08 UTC
✓ logging-app Update system application "logging-app" to 7.0.1 Completed - /runtime/storage-app Tue Apr 26 23:09 UTC
✓ monitoring-app Update system application "monitoring-app" to 7.0.12 Completed - /runtime/logging-app Tue Apr 26 23:11 UTC
✓ tiller-app Update system application "tiller-app" to 7.0.2 Completed - /runtime/monitoring-app Tue Apr 26 23:11 UTC
✓ site Update system application "site" to 7.0.38-dev.2 Completed - /runtime/tiller-app Tue Apr 26 23:12 UTC
✓ kubernetes Update system application "kubernetes" to 7.0.38-dev.2 Completed - /runtime/site Tue Apr 26 23:12 UTC
✓ migration Perform system database migration Completed - /runtime Tue Apr 26 23:12 UTC
✓ labels Update node labels Completed - - Tue Apr 26 23:12 UTC
✓ app Update installed application Completed - /migration Tue Apr 26 23:12 UTC
✓ telekube Update application "telekube" to 7.0.38-dev.2 Completed - - Tue Apr 26 23:12 UTC
✓ gc Run cleanup tasks Completed - /app Tue Apr 26 23:12 UTC
✓ robotest-f168927b-node-0 Clean up node "robotest-f168927b-node-0" Completed - - Tue Apr 26 23:12 UTC