Helm deployment does not on-board Azure Monitor for containers

Question

Helm deployment does not on-board Azure Monitor for containers

sossickd opened this issue 3 years ago · 4 comments

Following the instructions from the charts/azuremonitor-containers url the helm deployment does not on-board Azure Monitor for containers.

Expected behaviour: Following the instructions would automatically on-board Azure Monitor for containers.

Environment: AKS
Kubernetes: 1.19.11

Step 1 and 2 complete successfully with a "Log Analytics Workspace" and "ContainerInsights(iob-dev-westeurope-akstest-workspace)" solution created in the same resource group as the AKS cluster.

Step 3 fails with error "No k8s-master VMs or VMSSes found in the specified resource group:iob-dev-westeurope-akstest-rg-aks" but looking at the script i am not sure this applies to AKS.

Helm deployment completes without error.

helm upgrade --install --values=values.yaml azmon-containers microsoft/azuremonitor-containers --namespace kube-system Release "azmon-containers" does not exist. Installing it now. W0721 09:12:19.676421 20988 warnings.go:70] rbac.authorization.k8s.io/v1beta1 ClusterRole is deprecated in v1.17+, unavailable in v1.22+; use rbac.authorization.k8s.io/v1 ClusterRole W0721 09:12:19.834125 20988 warnings.go:70] rbac.authorization.k8s.io/v1beta1 ClusterRoleBinding is deprecated in v1.17+, unavailable in v1.22+; use rbac.authorization.k8s.io/v1 ClusterRoleBinding W0721 09:12:22.908504 20988 warnings.go:70] rbac.authorization.k8s.io/v1beta1 ClusterRole is deprecated in v1.17+, unavailable in v1.22+; use rbac.authorization.k8s.io/v1 ClusterRole W0721 09:12:23.088834 20988 warnings.go:70] rbac.authorization.k8s.io/v1beta1 ClusterRoleBinding is deprecated in v1.17+, unavailable in v1.22+; use rbac.authorization.k8s.io/v1 ClusterRoleBinding NAME: azmon-containers LAST DEPLOYED: Wed Jul 21 09:12:18 2021 NAMESPACE: kube-system STATUS: deployed REVISION: 1 TEST SUITE: None NOTES: azmon-containers deployment is complete.

Log output from omsagent

kubectl logs omsagent-48t4s -n kube-system
not setting customResourceId
Making curl request to oms endpint with domain: opinsights.azure.com
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
curl request to oms endpoint succeeded.
****************Start Config Processing********************
Both stdout & stderr log collection are turned off for namespaces: '*_kube-system_*.log'
****************End Config Processing********************
****************Start Config Processing********************
****************Start NPM Config Processing********************
config::npm::Successfully substituted the NPM placeholders into /etc/opt/microsoft/docker-cimprov/telegraf.conf file for DaemonSet
config::Starting to substitute the placeholders in td-agent-bit.conf file for log collection
config::Successfully substituted the placeholders in td-agent-bit.conf file
****************Start Prometheus Config Processing********************
config::No configmap mounted for prometheus custom config, using defaults
****************End Prometheus Config Processing********************
****************Start MDM Metrics Config Processing********************
****************End MDM Metrics Config Processing********************
****************Start Metric Collection Settings Processing********************
****************End Metric Collection Settings Processing********************
Making wget request to cadvisor endpoint with port 10250
Wget request using port 10250 succeeded. Using 10250
Making curl request to cadvisor endpoint /pods with port 10250 to get the configured container runtime on kubelet
configured container runtime on kubelet is : containerd
set caps for ruby process to read container env from proc
aks-system1-34726002-vmss000000
 * Starting periodic command scheduler cron
   ...done.
docker-cimprov 16.0.0.0
DOCKER_CIMPROV_VERSION=16.0.0.0
*** activating oneagent in legacy auth mode ***
setting mdsd workspaceid & key for workspace:68299338-cb11-46a8-a42e-977e476105e4
azure-mdsd 1.10.1-build.master.213
starting mdsd in legacy auth mode in main container...
*** starting fluentd v1 in daemonset
starting fluent-bit and setting telegraf conf file for daemonset
since container run time is containerd update the container log fluentbit Parser to cri from docker
nodename: aks-system1-34726002-vmss000000
replacing nodename in telegraf config
checking for listener on tcp #25226 and waiting for 30 secs if not..
File Doesnt Exist. Creating file...
Fluent Bit v1.6.8
* Copyright (C) 2019-2020 The Fluent Bit Authors
* Copyright (C) 2015-2018 Treasure Data
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

waitforlisteneronTCPport found listener on port:25226 in 5 secs
checking for listener on tcp #25228 and waiting for 30 secs if not..
Routing container logs thru v2 route...
waitforlisteneronTCPport found listener on port:25228 in 10 secs
Telegraf 1.18.0 (git: HEAD ac5c7f6a)
2021-07-21T08:12:52Z I! Starting Telegraf 1.18.0
td-agent-bit 1.6.8
stopping rsyslog...
 * Stopping enhanced syslogd rsyslogd
   ...done.
getting rsyslog status...
 * rsyslogd is not running

Can you confirm that the on-boarding of Azure Monitor for containers should have occurred and how to troubleshoot this further?

Answer 1 · 2021-07-21T15:42:35.000Z

@sossickd - Why are you installing thru our HELM chart in an AKS cluster , rather than using the native integration with AKS ?

Answer 2 · 2021-07-22T09:19:11.000Z

I have been asked to investigate using the helm deployment over using the AKS add-on so we have a more control over updates.

Answer 3 · 2021-08-31T10:37:59.000Z

This issue is stale because it has been open 7 days with no activity. Remove stale label or comment or this will be closed in 5 days.

Answer 4 · 2021-09-06T10:36:51.000Z

This issue was closed because it has been stalled for 12 days with no activity.