Azure/application-gateway-kubernetes-ingress

ingress appgw pod - forced restart necessary to create listener etc. for new frontend ip

VaclavK opened this issue · 1 comments

Describe the bug
ingress-appgw-container pod has to be restarted manually to refresh App GW config to match changes done to App GW

new private IP was added to the frontend as a new frontend config, pod restart necessary for listener and ancillaries to be created

To Reproduce

  • deployed App GW with Public IP ingress
  • deployed AKS with App GW ingress addon - greenfield
  • few months passed, new requirement to support private ingress
  • private ip frontend config added to the App GW
  • no listener etc. registered despite logs runnign resync on schedule
  • pod forcefully restarted - delete pod, new spun out
  • listener etc created

logs pasted below but I see error pointing to #1582 however since restart creates listener etc this does not seem to be related

is my expectation of listener created correct?- especially since restart does it?

Ingress Controller details

some redacting was done

  • Output of kubectl describe pod <ingress controller> . The pod name can be obtained by running helm list.
Name:             ingress-appgw-deployment-68f9bc65f9-pfz72
Namespace:        kube-system
Priority:         0
Service Account:  ingress-appgw-sa
Node:             aks-core-38546464-vmss000007/10.16.20.4
Start Time:       Fri, 05 Jan 2024 13:38:16 +0000
Labels:           app=ingress-appgw
                  kubernetes.azure.com/managedby=aks
                  pod-template-hash=68f9bc65f9
Annotations:      checksum/config: 65cd702e9dc47583d803b3590579c768f182985fe3e1e34d8a97608266a5f846
                  cluster-autoscaler.kubernetes.io/safe-to-evict: true
                  kubernetes.azure.com/metrics-scrape: true
                  prometheus.io/path: /metrics
                  prometheus.io/port: 8123
                  prometheus.io/scrape: true
                  resource-id:
                    /subscriptions/******/resourceGroups/compute-ne-dev/providers/Microsoft.ContainerService/managedClusters/com...
Status:           Running
IP:               10.244.6.14
IPs:
  IP:           10.244.6.14
Controlled By:  ReplicaSet/ingress-appgw-deployment-68f9bc65f9
Containers:
  ingress-appgw-container:
    Container ID:   containerd://c904aff219fa891a5d9f2d538a6f7d44f70bc957232803c4ac59c64ec746f1bd
    Image:          mcr.microsoft.com/azure-application-gateway/kubernetes-ingress:1.7.2
    Image ID:       mcr.microsoft.com/azure-application-gateway/kubernetes-ingress@sha256:eeb1d42ebfb872478d9b0b16f6936ea938d6e5eed4a59cde332b8757556a5e1f
    Port:           <none>
    Host Port:      <none>
    State:          Running
      Started:      Fri, 05 Jan 2024 13:38:16 +0000
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     700m
      memory:  600Mi
    Requests:
      cpu:      100m
      memory:   20Mi
    Liveness:   http-get http://:8123/health/alive delay=15s timeout=1s period=20s #success=1 #failure=3
    Readiness:  http-get http://:8123/health/ready delay=5s timeout=1s period=10s #success=1 #failure=3
    Environment Variables from:
      ingress-appgw-cm  ConfigMap  Optional: false
    Environment:
      AGIC_POD_NAME:                  ingress-appgw-deployment-68f9bc65f9-pfz72 (v1:metadata.name)
      AGIC_POD_NAMESPACE:             kube-system (v1:metadata.namespace)
      KUBERNETES_PORT_443_TCP_ADDR:   blah
      KUBERNETES_PORT:                tcp://blah:443
      KUBERNETES_PORT_443_TCP:        tcp://blah:443
      KUBERNETES_SERVICE_HOST:        blah
      AZURE_CLOUD_PROVIDER_LOCATION:  /etc/kubernetes/azure.json
    Mounts:
      /etc/kubernetes/azure.json from cloud-provider-config (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-r2r68 (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             True
  ContainersReady   True
  PodScheduled      True
Volumes:
  cloud-provider-config:
    Type:          HostPath (bare host directory volume)
    Path:          /etc/kubernetes/azure.json
    HostPathType:  File
  kube-api-access-r2r68:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 CriticalAddonsOnly op=Exists
                             node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                             node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason                     Age   From                       Message
  ----     ------                     ----  ----                       -------
  Normal   Scheduled                  36m   default-scheduler          Successfully assigned kube-system/ingress-appgw-deployment-68f9bc65f9-pfz72 to aks-core-38546464-vmss000007
  Normal   Pulled                     36m   kubelet                    Container image "mcr.microsoft.com/azure-application-gateway/kubernetes-ingress:1.7.2" already present on machine
  Normal   Created                    36m   kubelet                    Created container ingress-appgw-container     
  Normal   Started                    36m   kubelet                    Started container ingress-appgw-container 

* Output of `kubectl logs <ingress controller>. 

[agic.txt](https://github.com/Azure/application-gateway-kubernetes-ingress/files/13843268/agic.txt)

* Any Azure support tickets associated with this issue.

I think this may be chicken and egg/invalid test from my side

I did fresher test in another deployment (region) in our dev environment and there I ran the expected sequence of frontend i[ added first and then creating ingress @ AKS

in the original cluster I would remove/readd the IP in App GW few times but not recreating the ingress

so given this, I am closing the issue but hopefully whoever will see the same, will stumble upon this