Issue with getting metrics for configured IRSA - shows different arn role in logs of exporter, URGENT!!!!!!

Question

Issue with getting metrics for configured IRSA - shows different arn role in logs of exporter, URGENT!!!!!!

Vaibhav-1995 opened this issue a year ago · 2 comments

Vaibhav-1995 commented a year ago

What did you do

Deployed prometheus-cloudwatch-exporter in k8s env and added IRSA config as well as aws-iam-role config in values.yaml file as provided in below configuration file but facing issue in logs which states that one different iam role which is actually assigned to specific instance not have some permission to access metrics

What did you expect to see?

As we have created IRSA with required policy permissions and added that in config file - it should scrape metrics from cloudwatch

Are you currently working around this issue?

No - not able to understand why it is taking that different role which is applied on instance level (details in log msg below)

Environment

Exporter version: 0.25.1
Running in containers? : yes
Using the official image? : yes [quay.io/prometheus/cloudwatch-exporter:latest]

Exporter configuration file

expand

# Default values for prometheus-cloudwatch-exporter.
# This is a YAML-formatted file.
# Declare variables to be passed into your templates.

replicaCount: 1

image:
  repository: xx-xxxxx.xxxxxx.xxxxx/tanzu/containers/prometheus/cloudwatch-exporter
  # if not set appVersion field from Chart.yaml is used
  tag: latest
  pullPolicy: IfNotPresent
  pullSecrets:
  # - name: "image-pull-secret"

# Example proxy configuration:
# command:
#   - 'java'
#   - '-Dhttp.proxyHost=proxy.example.com'
#   - '-Dhttp.proxyPort=3128'
#   - '-Dhttps.proxyHost=proxy.example.com'
#   - '-Dhttps.proxyPort=3128'
#   - '-jar'
#   - '/cloudwatch_exporter.jar'
#   - '9106'
#   - '/config/config.yml'

command: []

containerPort: 9106

service:
  type: ClusterIP
  port: 9106
  portName: http
  annotations: {}
  labels: {}

pod:
  labels: {}
  annotations: {}

# Labels and annotations to attach to the deployment resource
deployment:
  labels: {}
  annotations: {}

# Extra environment variables
extraEnv:
  # - name: foo
  #   value: baa

resources: {}
  # We usually recommend not to specify default resources and to leave this as a conscious
  # choice for the user. This also increases chances charts run on environments with little
  # resources, such as Minikube. If you do want to specify resources, uncomment the following
  # lines, adjust them as necessary, and remove the curly braces after 'resources:'.
  # limits:
  #   cpu: 100m
  #    memory: 128Mi
  # requests:
  #   cpu: 100m
  #   memory: 128Mi

aws:
  role:
    iam.amazonaws.com/role: 'arn:aws:iam::xxxxxxxxxxxx:role/prom-cloudwatch-exporter-role'
  # Enables usage of regional STS endpoints rather than global which is default
  stsRegional:
    enabled: false

  # The name of a pre-created secret in which AWS credentials are stored. When
  # set, aws_access_key_id is assumed to be in a field called access_key,
  # aws_secret_access_key is assumed to be in a field called secret_key, and the
  # session token, if it exists, is assumed to be in a field called
  # security_token
  secret:
    name:
    includesSessionToken: false

  # Note: Do not specify the aws_access_key_id and aws_secret_access_key if you specified role or secret.name before
  aws_access_key_id:
  aws_secret_access_key:

serviceAccount:
  # Specifies whether a ServiceAccount should be created
  create: true
  # The name of the ServiceAccount to use.
  # If not set and create is true, a name is generated using the fullname template
  name:
  # annotations:
  # Will add the provided map to the annotations for the created serviceAccount
  # e.g.
  annotations:
    eks.amazonaws.com/role-arn: 'arn:aws:iam::xxxxxxxxxxxx:role/prom-cloudwatch-exporter-role'
  #   eks.amazonaws.com/role-arn: arn:aws:iam::1234567890:role/prom-cloudwatch-exporter-oidc
  #   eks.amazonaws.com/sts-regional-endpoints: "true"
  # Specifies whether to automount API credentials for the ServiceAccount.
  automountServiceAccountToken: true

rbac:
  # Specifies whether RBAC resources should be created
  create: true

# Configuration is rendered with `tpl` function, therefore you can use any Helm variables and/or templates here
config: |-
  # This is the default configuration for prometheus-cloudwatch-exporter
  region: ap-south-1
  period_seconds: 240
  metrics:
  - aws_dimensions:
    - InstanceId
    aws_metric_name: CPUUtilization
    aws_namespace: AWS/EC2
    aws_statistics:
    - Average
    aws_tag_select:
      resource_type_selection: ec2:instance
      resource_id_dimension: InstanceId
  - aws_dimensions:
    - InstanceId
    aws_metric_name: NetworkIn
    aws_namespace: AWS/EC2
    aws_statistics:
    - Average
  - aws_dimensions:
    - InstanceId
    aws_metric_name: NetworkOut
    aws_namespace: AWS/EC2
    aws_statistics:
    - Average
  - aws_dimensions:
    - InstanceId
    aws_metric_name: NetworkPacketsIn
    aws_namespace: AWS/EC2
    aws_statistics:
    - Average
  - aws_dimensions:
    - InstanceId
    aws_metric_name: NetworkPacketsOut
    aws_namespace: AWS/EC2
    aws_statistics:
    - Average
  - aws_dimensions:
    - InstanceId
    aws_metric_name: DiskWriteBytes
    aws_namespace: AWS/EC2
    aws_statistics:
    - Average
  - aws_dimensions:
    - InstanceId
    aws_metric_name: DiskReadBytes
    aws_namespace: AWS/EC2
    aws_statistics:
    - Average
  - aws_dimensions:
    - InstanceId
    aws_metric_name: CPUCreditBalance
    aws_namespace: AWS/EC2
    aws_statistics:
    - Average
  - aws_dimensions:
    - InstanceId
    aws_metric_name: CPUCreditUsage
    aws_namespace: AWS/EC2
    aws_statistics:
    - Average
  - aws_dimensions:
    - InstanceId
    aws_metric_name: StatusCheckFailed
    aws_namespace: AWS/EC2
    aws_statistics:
    - Average
  - aws_dimensions:
    - InstanceId
    aws_metric_name: StatusCheckFailed_Instance
    aws_namespace: AWS/EC2
    aws_statistics:
    - Average
  - aws_dimensions:
    - InstanceId
    aws_metric_name: StatusCheckFailed_System
    aws_namespace: AWS/EC2
    aws_statistics:
    - Average

  # - aws_namespace: AWS/ELB
  #   aws_metric_name: UnHealthyHostCount
  #   aws_dimensions: [AvailabilityZone, LoadBalancerName]
  #   aws_statistics: [Average]

  # - aws_namespace: AWS/ELB
  #   aws_metric_name: RequestCount
  #   aws_dimensions: [AvailabilityZone, LoadBalancerName]
  #   aws_statistics: [Sum]

  # - aws_namespace: AWS/ELB
  #   aws_metric_name: Latency
  #   aws_dimensions: [AvailabilityZone, LoadBalancerName]
  #   aws_statistics: [Average]

  # - aws_namespace: AWS/ELB
  #   aws_metric_name: SurgeQueueLength
  #   aws_dimensions: [AvailabilityZone, LoadBalancerName]
  #   aws_statistics: [Maximum, Sum]
nodeSelector: {}

tolerations: []

affinity: {}

# Configurable health checks against the /healthy and /ready endpoints
livenessProbe:
  path: /-/healthy
  initialDelaySeconds: 30
  periodSeconds: 5
  timeoutSeconds: 5
  successThreshold: 1
  failureThreshold: 3

readinessProbe:
  path: /-/ready
  initialDelaySeconds: 30
  periodSeconds: 5
  timeoutSeconds: 5
  successThreshold: 1
  failureThreshold: 3

serviceMonitor:
  # When set true then use a ServiceMonitor to configure scraping
  enabled: false
  # Set the namespace the ServiceMonitor should be deployed
  # namespace: monitoring
  # Set how frequently Prometheus should scrape
  # interval: 30s
  # Set path to cloudwatch-exporter telemtery-path
  # telemetryPath: /metrics
  # Set labels for the ServiceMonitor, use this to define your scrape label for Prometheus Operator
  # labels:
  # Set timeout for scrape
  # timeout: 10s
  # Set relabelings for the ServiceMonitor, use to apply to samples before scraping
  # relabelings: []
  # Set metricRelabelings for the ServiceMonitor, use to apply to samples for ingestion
  # metricRelabelings: []
  #
  # Example - note the Kubernetes convention of camelCase instead of Prometheus' snake_case
  # metricRelabelings:
  #   - sourceLabels: [dbinstance_identifier]
  #     action: replace
  #     replacement: mydbname
  #     targetLabel: dbname

prometheusRule:
  # Specifies whether a PrometheusRule should be created
  enabled: false
  # Set the namespace the PrometheusRule should be deployed
  # namespace: monitoring
  # Set labels for the PrometheusRule, use this to define your scrape label for Prometheus Operator
  # labels:
  # Example - note the Kubernetes convention of camelCase instead of Prometheus'
  # rules:
  #    - alert: ELB-Low-BurstBalance
  #      annotations:
  #        message: The ELB BurstBalance during the last 10 minutes is lower than 80%.
  #      expr: aws_ebs_burst_balance_average < 80
  #      for: 10m
  #      labels:
  #        severity: warning
  #    - alert: ELB-Low-BurstBalance
  #      annotations:
  #        message: The ELB BurstBalance during the last 10 minutes is lower than 50%.
  #      expr: aws_ebs_burst_balance_average < 50
  #      for: 10m
  #      labels:
  #        severity: warning
  #    - alert: ELB-Low-BurstBalance
  #      annotations:
  #        message: The ELB BurstBalance during the last 10 minutes is lower than 30%.
  #      expr: aws_ebs_burst_balance_average < 30
  #      for: 10m
  #      labels:
  #        severity: critical

ingress:
  enabled: false
  annotations: {}
    # kubernetes.io/ingress.class: nginx
    # kubernetes.io/tls-acme: "true"
  labels: {}
  path: /
  hosts:
    - chart-example.local
  tls: []
  #  - secretName: chart-example-tls
  #    hosts:
  #      - chart-example.local

  # For Kubernetes >= 1.18 you should specify the ingress-controller via the field ingressClassName
  # See https://kubernetes.io/blog/2020/04/02/improvements-to-the-ingress-api-in-kubernetes-1.18/#specifying-the-class-of-an-ingress
  # ingressClassName: nginx

  # pathType is only for k8s >= 1.18
  pathType: Prefix

securityContext:
  runAsUser: 65534  # run as nobody user instead of root
  fsGroup: 65534  # necessary to be able to read the EKS IAM token

containerSecurityContext: {}
  # allowPrivilegeEscalation: false
  # readOnlyRootFilesystem: true

# Leverage a PriorityClass to ensure your pods survive resource shortages
# ref: https://kubernetes.io/docs/concepts/configuration/pod-priority-preemption/
# priorityClassName: system-cluster-critical
priorityClassName: ""

Logs

expand

WARNING: CloudWatch scrape failed
software.amazon.awssdk.services.resourcegroupstaggingapi.model.ResourceGroupsTaggingApiException: User: arn:aws:sts::xxxxxxxxxxxx:assumed-role/eks_node_group-dev-iamrole/i-xxxxxxxxxxxxxxx is not authorized to perform: tag:GetResources because no identity-based policy allows the tag:GetResources action (Service: ResourceGroupsTaggingApi, Status Code: 400, Request ID: c4d034ac-36c5-4d25-8e53-343d6b089915)

Answer 1 · 2023-11-30T09:41:43.000Z

Hi Team,

Any update on above issue?
Your help will be appreciated!

Thanks!

Answer 2 · 2023-12-08T13:58:26.000Z

Resolved at my end.

Actually the SA name generated through helm deployment and the SA name I provided in Trust Relationships of IAM role was not matching. So, it was redirecting to eks-node IAM role.