toolboc/IoTEdge-DevOps

Helm init error

ToadTWP666 opened this issue · 12 comments

2020-09-11T15:44:07.7008740Z ##[section]Starting: helm init
2020-09-11T15:44:07.7132804Z ==============================================================================
2020-09-11T15:44:07.7133625Z Task : Package and deploy Helm charts
2020-09-11T15:44:07.7134063Z Description : Deploy, configure, update a Kubernetes cluster in Azure Container Service by running helm commands
2020-09-11T15:44:07.7135082Z Version : 0.173.0
2020-09-11T15:44:07.7135395Z Author : Microsoft Corporation
2020-09-11T15:44:07.7135690Z Help : https://aka.ms/azpipes-helm-tsg
2020-09-11T15:44:07.7136011Z ==============================================================================
2020-09-11T15:44:08.6613708Z [command]C:\hostedtoolcache\windows\helm\3.3.1\x64\windows-amd64\helm.exe init --upgrade --wait
2020-09-11T15:44:09.3117509Z Error: unknown command "init" for "helm"
2020-09-11T15:44:09.3120238Z
2020-09-11T15:44:09.3121253Z Did you mean this?
2020-09-11T15:44:09.3121810Z lint
2020-09-11T15:44:09.3133817Z
2020-09-11T15:44:09.3134646Z Run 'helm --help' for usage.
2020-09-11T15:44:09.3202147Z ##[error]Error: unknown command "init" for "helm"

Did you mean this?
lint

Run 'helm --help' for usage.

2020-09-11T15:44:09.3261617Z ##[section]Finishing: helm init

I am getting the following error when I try to execute the Integration step in the release pipeline. The step is the Helm init step. I am using Helm version 2.9.1. Please advise.

Interesting, it looks like helm init command has been removed in recent versions, and there is not a replacement for it.

Logs indicate that you are on version 3.3.1 which is affected by the aforementioned removal of helm init (all versions above 2.x)
C:\hostedtoolcache\windows\helm\3.3.1\x64\windows-amd64\helm.exe init --upgrade --wait

Try to modify the command to helm --upgrade --wait and see if that works.

Hi Paul. This is Rick Durham GBB ML/AI team at MSFT. Thanks for the assistance. I really like this repo in terms what it does i.e. provide a way to smoke test your deployment and test its scalability and robustness and plan on demoing it on Monday night if I can get it working. If I modify the command to a helm upgrade (that was previously an Helm init (step 2 in the Integration release pipeline step) it is requiring me to provide a chart name in the UI. What should I use as the chart name? It seems like we also do an upgrade in step 4 (Helm upgrade step). Would we need both?

Hi Rick, it seems this issue could be solved a couple ways:

#1 Ensure that the helm instance is in fact version 2.9.1 (these steps are confirmed working there and it may be the easiest path forward)
image
image

#2 Attempt to workaround the removal of helm init in 3.3.1 by modifying any references to that command. The value for chart name should be: azure-iot-edge-device-container/azure-iot-edge-device-container. You should only need to run helm upgrade step once, maybe just outright removing the helm init step will be enough?

Hi Paul. So I opted for solution # 1 and set the version in the first step to 2.9.1: Here is the result from executing that step:
2020-09-11T21:20:11.0747814Z ##[section]Starting: Install Helm 2.9.1
2020-09-11T21:20:11.0903225Z ==============================================================================
2020-09-11T21:20:11.0903630Z Task : Helm tool installer
2020-09-11T21:20:11.0903951Z Description : Install Helm on an agent machine
2020-09-11T21:20:11.0904219Z Version : 1.171.0
2020-09-11T21:20:11.0904491Z Author : Microsoft Corporation
2020-09-11T21:20:11.0904840Z Help : https://aka.ms/azpipes-helm-installer-tsg
2020-09-11T21:20:11.0905230Z ==============================================================================
2020-09-11T21:20:11.3885969Z Downloading: https://get.helm.sh/helm-v2.9.1-windows-amd64.zip
2020-09-11T21:20:12.2640664Z Extracting archive
2020-09-11T21:20:12.2690523Z [command]C:\windows\system32\chcp.com 65001
2020-09-11T21:20:12.2872119Z Active code page: 65001
2020-09-11T21:20:12.3121465Z [command]C:\windows\System32\WindowsPowerShell\v1.0\powershell.exe -NoLogo -Sta -NoProfile -NonInteractive -ExecutionPolicy Unrestricted -Command "$ErrorActionPreference = 'Stop' ; try { Add-Type -AssemblyName System.IO.Compression.FileSystem } catch { } ; [System.IO.Compression.ZipFile]::ExtractToDirectory('D:\a_temp\helm-v2.9.1-e981e712-566d-456a-ac2b-755e981dcdab.zip', 'D:\a_temp\574ed942-35a5-4a97-ab17-28d7918ba08b')"
2020-09-11T21:20:13.2110459Z Caching tool: helm 2.9.1 x64
2020-09-11T21:20:13.2567721Z Prepending PATH environment variable with directory: C:\hostedtoolcache\windows\helm\2.9.1\x64\windows-amd64
2020-09-11T21:20:13.2573617Z Verifying helm installation...
2020-09-11T21:20:13.2631624Z [command]C:\hostedtoolcache\windows\helm\2.9.1\x64\windows-amd64\helm.exe init --client-only
2020-09-11T21:20:13.7640712Z Creating C:\Users\VssAdministrator.helm
2020-09-11T21:20:13.7645670Z Creating C:\Users\VssAdministrator.helm\repository
2020-09-11T21:20:13.7708633Z Creating C:\Users\VssAdministrator.helm\repository\cache
2020-09-11T21:20:13.7710560Z Creating C:\Users\VssAdministrator.helm\repository\local
2020-09-11T21:20:13.7713017Z Creating C:\Users\VssAdministrator.helm\plugins
2020-09-11T21:20:13.7715930Z Creating C:\Users\VssAdministrator.helm\starters
2020-09-11T21:20:13.7776317Z Creating C:\Users\VssAdministrator.helm\cache\archive
2020-09-11T21:20:13.7780922Z Creating C:\Users\VssAdministrator.helm\repository\repositories.yaml
2020-09-11T21:20:13.7781539Z Adding stable repo with URL: https://kubernetes-charts.storage.googleapis.com
2020-09-11T21:20:15.7664360Z Adding local repo with URL: http://127.0.0.1:8879/charts
2020-09-11T21:20:15.8646287Z $HELM_HOME has been configured at C:\Users\VssAdministrator.helm.
2020-09-11T21:20:15.8647580Z Not installing Tiller due to 'client-only' flag having been set
2020-09-11T21:20:15.8648260Z Happy Helming!
2020-09-11T21:20:15.9049878Z ##[section]Finishing: Install Helm 2.9.1

However now I am getting the following error from the init step:
2020-09-11T21:20:15.9095527Z ##[section]Starting: helm init
2020-09-11T21:20:15.9221699Z ==============================================================================
2020-09-11T21:20:15.9222126Z Task : Package and deploy Helm charts
2020-09-11T21:20:15.9222558Z Description : Deploy, configure, update a Kubernetes cluster in Azure Container Service by running helm commands
2020-09-11T21:20:15.9222946Z Version : 0.173.0
2020-09-11T21:20:15.9223203Z Author : Microsoft Corporation
2020-09-11T21:20:15.9223522Z Help : https://aka.ms/azpipes-helm-tsg
2020-09-11T21:20:15.9223897Z ==============================================================================
2020-09-11T21:20:16.7482836Z [command]C:\hostedtoolcache\windows\helm\2.9.1\x64\windows-amd64\helm.exe init --upgrade --wait
2020-09-11T21:20:17.0722029Z Error: error installing: the server could not find the requested resource
2020-09-11T21:20:17.0722485Z $HELM_HOME has been configured at C:\Users\VssAdministrator.helm.
2020-09-11T21:20:17.0841970Z ##[error]Error: error installing: the server could not find the requested resource

2020-09-11T21:20:17.0877831Z ##[section]Finishing: helm init

Please note I completely blew away my original AKS cluster and recreated another one to start from a clean slate so to speak and refreshed all of the UI inputs.

Hmmm, it looks like your workload is running on a Windows Agent

C:\hostedtoolcache\windows\helm\2.9.1\x64\windows-amd64\helm.exe init --upgrade --w

What happens when you change to run on a Hosted Ubuntu platform:
image

Testing now.

So I recreated the release pipeline using Ubuntu 1604. Got the following error from the init step:
2020-09-11T22:18:04.9440962Z ##[section]Starting: helm init
2020-09-11T22:18:04.9456715Z ==============================================================================
2020-09-11T22:18:04.9457453Z Task : Package and deploy Helm charts
2020-09-11T22:18:04.9457940Z Description : Deploy, configure, update a Kubernetes cluster in Azure Container Service by running helm commands
2020-09-11T22:18:04.9458360Z Version : 0.173.0
2020-09-11T22:18:04.9458647Z Author : Microsoft Corporation
2020-09-11T22:18:04.9458986Z Help : https://aka.ms/azpipes-helm-tsg
2020-09-11T22:18:04.9459380Z ==============================================================================
2020-09-11T22:18:05.7064597Z [command]/opt/hostedtoolcache/helm/2.9.1/x64/linux-amd64/helm init --upgrade --wait
2020-09-11T22:18:06.0385502Z Error: error installing: the server could not find the requested resource
2020-09-11T22:18:06.0388290Z $HELM_HOME has been configured at /home/vsts/.helm.
2020-09-11T22:18:06.0437130Z ##[error]Error: error installing: the server could not find the requested resource

2020-09-11T22:18:06.0500705Z ##[section]Finishing: helm init

Looks like we are getting closer, the issue now seems to be due to a change in the way Tiller is installed in Kubernetes 1.16.

I have verified on my end that the deployment works on a version 1.15 K8s instance:
image

I will upgrade the cluster to latest and report back if I am able to reproduce.

Edit: Looks like that is the root cause, I have upgraded to 1.16 and am now receiving the same error, standby...

image

image

@ToadTWP666 ,

I looked into the issue and unfortunately, the changes between Kubernetes 1.15 and 1.16 are pretty significant. I've updated the azure-iot-edge-device-container helm chart to support 1.16 and have confirmed that it can be used with helm 2.15+. However, it requires additional configuration of RBAC to enable tiller via helm init that in my opinion makes this tutorial much more complex than it already is / needs to be.

The good news is that the workaround is rather simple, all of these issues go away and everything will work as-is by ensuring that your Kubernetes service is at version 1.15.x and that RBAC is disabled when creating your AKS cluster in the Azure Portal:
image
image

Thus, you should be able to get up and running by recreating another AKS instance based on version 1.15.x and while AKS does allow for in-place upgrades, downgrades are not supported which means you will have to recreate a new instance. In addition, if RBAC is enabled during creation, it is not possible to disable it without recreating a new K8s instance.

My thoughts in reaction to this issue:
The Azure portal currently defaults to version 1.16.13 with RBAC enabled and while I'd like to support it, for the aforementioned reasons, I feel it is best at this time to version lock this repo to Kubernetes 1.15.x and possibly revisit 1.16+ with RBAC support at a later date. While adding support for the default version with Role Based Access Controls is likely the right thing to do in production, it is also a hefty task that will require updating numerous portions of this repo and associated module that is currently on MS Learn that would ultimately leave the user with more things that can go wrong (RBAC is not fun for the uninitiated). If this were an easy overnight fix, I'd be all up for it, but this will require some time to properly address and then again, it could change. The trade-off here is to lock for stability to enhance the user experience of following this tutorial rather than trying to support every version out there. This issue also raises a larger concern around the inability to downgrade from in-place upgrades or toggle RBAC, which as we have seen here can be very much breaking in nature.

Thank you for filing the issue and working with me to resolve. Please let me know if you experience any issues after deploying to a 1.15.x instance with RBAC disabled. In the meantime, I will update this repo to mention that it explicitly supports Kubernetes 1.15.x with RBAC disabled + Tiller 2.9.1 at this time.