Issue with getting metrics for configured IRSA - shows different arn role in logs of exporter, URGENT!!!!!!
Vaibhav-1995 opened this issue · 2 comments
What did you do
Deployed prometheus-cloudwatch-exporter in k8s env and added IRSA config as well as aws-iam-role config in values.yaml file as provided in below configuration file but facing issue in logs which states that one different iam role which is actually assigned to specific instance not have some permission to access metrics
What did you expect to see?
As we have created IRSA with required policy permissions and added that in config file - it should scrape metrics from cloudwatch
Are you currently working around this issue?
No - not able to understand why it is taking that different role which is applied on instance level (details in log msg below)
Environment
- Exporter version:
0.25.1
- Running in containers? : yes
- Using the official image? : yes [quay.io/prometheus/cloudwatch-exporter:latest]
Exporter configuration file
expand
# Default values for prometheus-cloudwatch-exporter.
# This is a YAML-formatted file.
# Declare variables to be passed into your templates.
replicaCount: 1
image:
repository: xx-xxxxx.xxxxxx.xxxxx/tanzu/containers/prometheus/cloudwatch-exporter
# if not set appVersion field from Chart.yaml is used
tag: latest
pullPolicy: IfNotPresent
pullSecrets:
# - name: "image-pull-secret"
# Example proxy configuration:
# command:
# - 'java'
# - '-Dhttp.proxyHost=proxy.example.com'
# - '-Dhttp.proxyPort=3128'
# - '-Dhttps.proxyHost=proxy.example.com'
# - '-Dhttps.proxyPort=3128'
# - '-jar'
# - '/cloudwatch_exporter.jar'
# - '9106'
# - '/config/config.yml'
command: []
containerPort: 9106
service:
type: ClusterIP
port: 9106
portName: http
annotations: {}
labels: {}
pod:
labels: {}
annotations: {}
# Labels and annotations to attach to the deployment resource
deployment:
labels: {}
annotations: {}
# Extra environment variables
extraEnv:
# - name: foo
# value: baa
resources: {}
# We usually recommend not to specify default resources and to leave this as a conscious
# choice for the user. This also increases chances charts run on environments with little
# resources, such as Minikube. If you do want to specify resources, uncomment the following
# lines, adjust them as necessary, and remove the curly braces after 'resources:'.
# limits:
# cpu: 100m
# memory: 128Mi
# requests:
# cpu: 100m
# memory: 128Mi
aws:
role:
iam.amazonaws.com/role: 'arn:aws:iam::xxxxxxxxxxxx:role/prom-cloudwatch-exporter-role'
# Enables usage of regional STS endpoints rather than global which is default
stsRegional:
enabled: false
# The name of a pre-created secret in which AWS credentials are stored. When
# set, aws_access_key_id is assumed to be in a field called access_key,
# aws_secret_access_key is assumed to be in a field called secret_key, and the
# session token, if it exists, is assumed to be in a field called
# security_token
secret:
name:
includesSessionToken: false
# Note: Do not specify the aws_access_key_id and aws_secret_access_key if you specified role or secret.name before
aws_access_key_id:
aws_secret_access_key:
serviceAccount:
# Specifies whether a ServiceAccount should be created
create: true
# The name of the ServiceAccount to use.
# If not set and create is true, a name is generated using the fullname template
name:
# annotations:
# Will add the provided map to the annotations for the created serviceAccount
# e.g.
annotations:
eks.amazonaws.com/role-arn: 'arn:aws:iam::xxxxxxxxxxxx:role/prom-cloudwatch-exporter-role'
# eks.amazonaws.com/role-arn: arn:aws:iam::1234567890:role/prom-cloudwatch-exporter-oidc
# eks.amazonaws.com/sts-regional-endpoints: "true"
# Specifies whether to automount API credentials for the ServiceAccount.
automountServiceAccountToken: true
rbac:
# Specifies whether RBAC resources should be created
create: true
# Configuration is rendered with `tpl` function, therefore you can use any Helm variables and/or templates here
config: |-
# This is the default configuration for prometheus-cloudwatch-exporter
region: ap-south-1
period_seconds: 240
metrics:
- aws_dimensions:
- InstanceId
aws_metric_name: CPUUtilization
aws_namespace: AWS/EC2
aws_statistics:
- Average
aws_tag_select:
resource_type_selection: ec2:instance
resource_id_dimension: InstanceId
- aws_dimensions:
- InstanceId
aws_metric_name: NetworkIn
aws_namespace: AWS/EC2
aws_statistics:
- Average
- aws_dimensions:
- InstanceId
aws_metric_name: NetworkOut
aws_namespace: AWS/EC2
aws_statistics:
- Average
- aws_dimensions:
- InstanceId
aws_metric_name: NetworkPacketsIn
aws_namespace: AWS/EC2
aws_statistics:
- Average
- aws_dimensions:
- InstanceId
aws_metric_name: NetworkPacketsOut
aws_namespace: AWS/EC2
aws_statistics:
- Average
- aws_dimensions:
- InstanceId
aws_metric_name: DiskWriteBytes
aws_namespace: AWS/EC2
aws_statistics:
- Average
- aws_dimensions:
- InstanceId
aws_metric_name: DiskReadBytes
aws_namespace: AWS/EC2
aws_statistics:
- Average
- aws_dimensions:
- InstanceId
aws_metric_name: CPUCreditBalance
aws_namespace: AWS/EC2
aws_statistics:
- Average
- aws_dimensions:
- InstanceId
aws_metric_name: CPUCreditUsage
aws_namespace: AWS/EC2
aws_statistics:
- Average
- aws_dimensions:
- InstanceId
aws_metric_name: StatusCheckFailed
aws_namespace: AWS/EC2
aws_statistics:
- Average
- aws_dimensions:
- InstanceId
aws_metric_name: StatusCheckFailed_Instance
aws_namespace: AWS/EC2
aws_statistics:
- Average
- aws_dimensions:
- InstanceId
aws_metric_name: StatusCheckFailed_System
aws_namespace: AWS/EC2
aws_statistics:
- Average
# - aws_namespace: AWS/ELB
# aws_metric_name: UnHealthyHostCount
# aws_dimensions: [AvailabilityZone, LoadBalancerName]
# aws_statistics: [Average]
# - aws_namespace: AWS/ELB
# aws_metric_name: RequestCount
# aws_dimensions: [AvailabilityZone, LoadBalancerName]
# aws_statistics: [Sum]
# - aws_namespace: AWS/ELB
# aws_metric_name: Latency
# aws_dimensions: [AvailabilityZone, LoadBalancerName]
# aws_statistics: [Average]
# - aws_namespace: AWS/ELB
# aws_metric_name: SurgeQueueLength
# aws_dimensions: [AvailabilityZone, LoadBalancerName]
# aws_statistics: [Maximum, Sum]
nodeSelector: {}
tolerations: []
affinity: {}
# Configurable health checks against the /healthy and /ready endpoints
livenessProbe:
path: /-/healthy
initialDelaySeconds: 30
periodSeconds: 5
timeoutSeconds: 5
successThreshold: 1
failureThreshold: 3
readinessProbe:
path: /-/ready
initialDelaySeconds: 30
periodSeconds: 5
timeoutSeconds: 5
successThreshold: 1
failureThreshold: 3
serviceMonitor:
# When set true then use a ServiceMonitor to configure scraping
enabled: false
# Set the namespace the ServiceMonitor should be deployed
# namespace: monitoring
# Set how frequently Prometheus should scrape
# interval: 30s
# Set path to cloudwatch-exporter telemtery-path
# telemetryPath: /metrics
# Set labels for the ServiceMonitor, use this to define your scrape label for Prometheus Operator
# labels:
# Set timeout for scrape
# timeout: 10s
# Set relabelings for the ServiceMonitor, use to apply to samples before scraping
# relabelings: []
# Set metricRelabelings for the ServiceMonitor, use to apply to samples for ingestion
# metricRelabelings: []
#
# Example - note the Kubernetes convention of camelCase instead of Prometheus' snake_case
# metricRelabelings:
# - sourceLabels: [dbinstance_identifier]
# action: replace
# replacement: mydbname
# targetLabel: dbname
prometheusRule:
# Specifies whether a PrometheusRule should be created
enabled: false
# Set the namespace the PrometheusRule should be deployed
# namespace: monitoring
# Set labels for the PrometheusRule, use this to define your scrape label for Prometheus Operator
# labels:
# Example - note the Kubernetes convention of camelCase instead of Prometheus'
# rules:
# - alert: ELB-Low-BurstBalance
# annotations:
# message: The ELB BurstBalance during the last 10 minutes is lower than 80%.
# expr: aws_ebs_burst_balance_average < 80
# for: 10m
# labels:
# severity: warning
# - alert: ELB-Low-BurstBalance
# annotations:
# message: The ELB BurstBalance during the last 10 minutes is lower than 50%.
# expr: aws_ebs_burst_balance_average < 50
# for: 10m
# labels:
# severity: warning
# - alert: ELB-Low-BurstBalance
# annotations:
# message: The ELB BurstBalance during the last 10 minutes is lower than 30%.
# expr: aws_ebs_burst_balance_average < 30
# for: 10m
# labels:
# severity: critical
ingress:
enabled: false
annotations: {}
# kubernetes.io/ingress.class: nginx
# kubernetes.io/tls-acme: "true"
labels: {}
path: /
hosts:
- chart-example.local
tls: []
# - secretName: chart-example-tls
# hosts:
# - chart-example.local
# For Kubernetes >= 1.18 you should specify the ingress-controller via the field ingressClassName
# See https://kubernetes.io/blog/2020/04/02/improvements-to-the-ingress-api-in-kubernetes-1.18/#specifying-the-class-of-an-ingress
# ingressClassName: nginx
# pathType is only for k8s >= 1.18
pathType: Prefix
securityContext:
runAsUser: 65534 # run as nobody user instead of root
fsGroup: 65534 # necessary to be able to read the EKS IAM token
containerSecurityContext: {}
# allowPrivilegeEscalation: false
# readOnlyRootFilesystem: true
# Leverage a PriorityClass to ensure your pods survive resource shortages
# ref: https://kubernetes.io/docs/concepts/configuration/pod-priority-preemption/
# priorityClassName: system-cluster-critical
priorityClassName: ""
Logs
expand
WARNING: CloudWatch scrape failed
software.amazon.awssdk.services.resourcegroupstaggingapi.model.ResourceGroupsTaggingApiException: User: arn:aws:sts::xxxxxxxxxxxx:assumed-role/eks_node_group-dev-iamrole/i-xxxxxxxxxxxxxxx is not authorized to perform: tag:GetResources because no identity-based policy allows the tag:GetResources action (Service: ResourceGroupsTaggingApi, Status Code: 400, Request ID: c4d034ac-36c5-4d25-8e53-343d6b089915)
Hi Team,
Any update on above issue?
Your help will be appreciated!
Thanks!
Resolved at my end.
Actually the SA name generated through helm deployment and the SA name I provided in Trust Relationships of IAM role was not matching. So, it was redirecting to eks-node IAM role.