open-horizon/anax

Bug: k8s auto upgrade fails if using all-in-one with http (ie. without https)

Closed this issue · 1 comments

Describe the bug.

Set up : all-in-one hub with http (and not https)
Push one version into hub (ie. 2.31.0-1534)
Push newer version to hub (ie. 2.31.0-1540)
Install agent 2.31.0-1534 on an edgecluster
Create NMP to upgrade to 2.31.0-1540

Agent logs

I0617 23:31:14.178271      15 worker.go:353] CommandDispatcher: NodeManagement command processor blocking for commands
I0617 23:31:14.188107      15 node_management_status.go:40] Putting node management policy status for node myorg/my-edge-agent and policy MyNmpFor1540. Status is: AgentUpgrade: ScheduledTime: 2024-06-17T23:26:06Z, ActualStartTime: 2024-06-17T23:26:31Z, CompletionTime: , UpgradedVersions: SoftwareVersion: 2.31.0-1540, CertVersion: , ConfigVersion: 1.0.0, Status: initiated, K8S: <nil>, ErrorMessage: , BaseWorkingDirectory: /var/horizon/nmp, AgentUpgradeInternal: <nil>.
I0617 23:31:14.188338      15 rpc.go:94] Exchange RPC Invoking exchange PUT at http://9.46.84.245:3090/v1/orgs/myorg/nodes/my-edge-agent/managementStatus/MyNmpFor1540 with AgentUpgrade: ScheduledTime: 2024-06-17T23:26:06Z, ActualStartTime: 2024-06-17T23:26:31Z, CompletionTime: , UpgradedVersions: SoftwareVersion: 2.31.0-1540, CertVersion: , ConfigVersion: 1.0.0, Status: initiated, K8S: <nil>, ErrorMessage: , BaseWorkingDirectory: , AgentUpgradeInternal: <nil>
I0617 23:31:14.220752      15 cluster_upgrade_worker.go:603] Cluster upgrade worker: reading in agent config file: /var/horizon/nmp/myorg/MyNmpFor1540/agent-install.cfg
I0617 23:31:14.220912      15 cluster_install_files.go:53] Cluster upgrade worker: get HZN_EXCHANGE_URL=http://9.46.84.245:3090/v1
I0617 23:31:14.220927      15 cluster_install_files.go:53] Cluster upgrade worker: get HZN_FSS_CSSURL=http://9.46.84.245:9443/
I0617 23:31:14.220954      15 cluster_install_files.go:53] Cluster upgrade worker: get HZN_AGBOT_URL=http://9.46.84.245:3111
I0617 23:31:14.220963      15 cluster_install_files.go:53] Cluster upgrade worker: get HZN_FDO_SVC_URL=http://9.46.84.245:9008/api
I0617 23:31:14.220971      15 cluster_install_files.go:53] Cluster upgrade worker: get AGENT_NAMESPACE=openhorizon-agent
I0617 23:31:14.220984      15 cluster_install_files.go:53] Cluster upgrade worker: get HZN_CONFIG_VERSION=1.0.0
I0617 23:31:14.221012      15 kubeClient.go:62] Cluster upgrade worker: Read configmap value openhorizon-agent-config under agent namespace my-edge
I0617 23:31:14.221050      15 kubeClient.go:55] Cluster upgrade worker: Get configmap openhorizon-agent-config under agent namespace my-edge
I0617 23:31:14.237151      15 kubeClient.go:100] Cluster upgrade worker: In configmap openhorizon-agent-config find HZN_EXCHANGE_URL=http://9.46.84.245:3090/v1
I0617 23:31:14.237180      15 kubeClient.go:100] Cluster upgrade worker: In configmap openhorizon-agent-config find HZN_FSS_CSSURL=http://9.46.84.245:9443/
I0617 23:31:14.237188      15 kubeClient.go:100] Cluster upgrade worker: In configmap openhorizon-agent-config find HZN_AGBOT_URL=http://9.46.84.245:3111
I0617 23:31:14.237195      15 kubeClient.go:100] Cluster upgrade worker: In configmap openhorizon-agent-config find HZN_FDO_SVC_URL=http://9.46.84.245:9008/api
I0617 23:31:14.237205      15 kubeClient.go:100] Cluster upgrade worker: In configmap openhorizon-agent-config find HZN_DEVICE_ID=my-edge-agent
I0617 23:31:14.237212      15 kubeClient.go:100] Cluster upgrade worker: In configmap openhorizon-agent-config find HZN_NODE_ID=my-edge-agent
I0617 23:31:14.237218      15 kubeClient.go:100] Cluster upgrade worker: In configmap openhorizon-agent-config find HZN_AGENT_PORT=8510
I0617 23:31:14.237225      15 kubeClient.go:100] Cluster upgrade worker: In configmap openhorizon-agent-config find HZN_CONFIG_VERSION=
I0617 23:31:14.237235      15 cluster_upgrade_worker.go:637] Cluster upgrade worker: agent install config is same: false
I0617 23:31:14.240257      15 cluster_install_files.go:239] Cluster upgrade worker: configmap.needChange is set to true in status file
I0617 23:31:14.240280      15 cluster_upgrade_worker.go:645] Cluster upgrade worker: reading in agent cert file: /var/horizon/nmp/myorg/MyNmpFor1540/agent-install.crt
I0617 23:31:14.240304      15 cluster_upgrade_worker.go:443] Cluster upgrade worker: configIsSame: false, certIsSame: true, will need to validate config and cert for nmp myorg/MyNmpFor1540
E0617 23:31:14.240400      15 cluster_upgrade_worker.go:459] Cluster upgrade worker: Failed to validate exchangeURL and/or cert for nmp: myorg/MyNmpFor1540, error: open /etc/default/cert/agent-install.crt: no such file or directory
I0617 23:31:14.240415      15 cluster_upgrade_worker.go:259] Cluster upgrade worker: Set status to precheck failed in db and status file for nmp myorg/MyNmpFor1540
I0617 23:31:14.242427      15 node_management_status.go:15] Saving nmp status AgentUpgrade: ScheduledTime: 2024-06-17T23:26:06Z, ActualStartTime: 2024-06-17T23:26:31Z, CompletionTime: , UpgradedVersions: SoftwareVersion: 2.31.0-1540, CertVersion: , ConfigVersion: 1.0.0, Status: precheck failed, K8S: <nil>, ErrorMessage: Failed to validate exchangeURL and/or cert for nmp: myorg/MyNmpFor1540, error: open /etc/default/cert/agent-install.crt: no such file or directory, BaseWorkingDirectory: /var/horizon/nmp, AgentUpgradeInternal: AllowDowngrade: false, Manifest: IBM/edgeNodeFiles_manifest_2.31.0-1540, ScheduledUnixTime: 2024-06-17 23:26:06 +0000 UTC, LatestMap: SoftwareLatest: false, ConfigLatest: false, CertLatest: false
I0617 23:31:14.249051      15 node_management_status.go:40] Putting node management policy status for node myorg/my-edge-agent and policy MyNmpFor1540. Status is: AgentUpgrade: ScheduledTime: 2024-06-17T23:26:06Z, ActualStartTime: 2024-06-17T23:26:31Z, CompletionTime: , UpgradedVersions: SoftwareVersion: 2.31.0-1540, CertVersion: , ConfigVersion: 1.0.0, Status: precheck failed, K8S: <nil>, ErrorMessage: Failed to validate exchangeURL and/or cert for nmp: myorg/MyNmpFor1540, error: open /etc/default/cert/agent-install.crt: no such file or directory, BaseWorkingDirectory: /var/horizon/nmp, AgentUpgradeInternal: <nil>.
I0617 23:31:14.249207      15 rpc.go:94] Exchange RPC Invoking exchange PUT at http://9.46.84.245:3090/v1/orgs/myorg/nodes/my-edge-agent/managementStatus/MyNmpFor1540 with AgentUpgrade: ScheduledTime: 2024-06-17T23:26:06Z, ActualStartTime: 2024-06-17T23:26:31Z, CompletionTime: , UpgradedVersions: SoftwareVersion: 2.31.0-1540, CertVersion: , ConfigVersion: 1.0.0, Status: precheck failed, K8S: <nil>, ErrorMessage: Failed to validate exchangeURL and/or cert for nmp: myorg/MyNmpFor1540, error: open /etc/default/cert/agent-install.crt: no such file or directory, BaseWorkingDirectory: , AgentUpgradeInternal: <nil>
I0617 23:31:14.289990      15 cluster_upgrade_worker.go:286] Cluster upgrade worker: Status is updated to AgentUpgrade: ScheduledTime: 2024-06-17T23:26:06Z, ActualStartTime: 2024-06-17T23:26:31Z, CompletionTime: , UpgradedVersions: SoftwareVersion: 2.31.0-1540, CertVersion: , ConfigVersion: 1.0.0, Status: precheck failed, K8S: <nil>, ErrorMessage: Failed to validate exchangeURL and/or cert for nmp: myorg/MyNmpFor1540, error: open /etc/default/cert/agent-install.crt: no such file or directory, BaseWorkingDirectory: /var/horizon/nmp, AgentUpgradeInternal: AllowDowngrade: false, Manifest: IBM/edgeNodeFiles_manifest_2.31.0-1540, ScheduledUnixTime: 2024-06-17 23:26:06 +0000 UTC, LatestMap: SoftwareLatest: false, ConfigLatest: false, CertLatest: false for nmp myorg/MyNmpFor1540
I0617 23:31:14.290041      15 worker.go:325] CommandDispatcher: ClusterUpgrade handled command (*clusterupgrade.ClusterUpgradeCommand)
I0617 23:31:14.290061      15 worker.go:353] CommandDispatcher: ClusterUpgrade command processor blocking for commands
I

Since this is http, there the check for the cert should be skipped if just doing a software update or a config update...
A cert update should just be ignored I expect.

Describe the steps to reproduce the behavior.

No response

Expected behavior.

No response

Screenshots.

No response

Operating Environment

Linux

Additional Information

No response

Closing this.. the manifest had to upgrade the certificate which was invalid in this case