IRSA forwarding of metrics to other aws accounts does not work in latest version of cloudwatch agent
ellen-lau opened this issue · 1 comments
Describe the bug
A couple months ago I was successfully using IRSA on ROSA to forward metrics from one aws account's cloudwatch (the aws account associated with the ROSA cluster running my application pods) to another secondary aws account's cloudwatch. However, after restarting the pod around a month ago (which grabs from amazon/cloudwatch-agent:latest
), it looks like even with the proper IAM roles set up for metric forwarding, the metrics from my application pod are not being forwarded to the other aws account's cloudwatch -- it is only being sent to the cloudwatch in the aws account associated with the ROSA cluster running my application pods.
Reverting to version v1.247360.0
or image amazon/cloudwatch-agent:1.247360.0b252689
resolved the issue.
What did you expect to see?
I expected to see the metrics from my application pod forwarded to my secondary aws account.
What did you see instead?
I did not see any forwarding, and the metrics are only sent to the cloudwatch for the ROSA cluster's associated aws account.
What version did you use?
I see the issue with amazon/cloudwatch-agent:latest
, but do not see it with image amazon/cloudwatch-agent:1.247360.0b252689
.
What config did you use?
# create configmap for prometheus cwagent config
kind: ConfigMap
metadata:
name: prometheus-cwagentconfig
namespace: <namespace>
apiVersion: v1
data:
# cwagent json config
cwagentconfig.json: |
{
"agent": {
"region": "us-east-1",
"debug": true,
"credentials": {
"role_arn": "arn:aws:iam::<secondary_aws_account_id>:role/<role_name>"
}
},
"logs": {
"metrics_collected": {
"prometheus": {
"cluster_name": "<namespace>",
"log_group_name": "/aws/containerinsights/<namespace>/prometheus",
"prometheus_config_path": "/etc/prometheusconfig/prometheus.yaml",
"emf_processor": {
"metric_declaration": [
{"source_labels": ["job"],
"label_matcher": "^<namespace>-scrape-job$",
"dimensions": [["Namespace","job","pod_name"]],
"metric_selectors": [
<metric_selectors>
]
}
]
}
}
},
"force_flush_interval": 5
}
}
---
# create configmap for prometheus scrape config
kind: ConfigMap
metadata:
name: prometheus-config
namespace: <namespace>
apiVersion: v1
data:
# prometheus config
prometheus.yaml: |
global:
scrape_interval: 30s
scrape_timeout: 10s
scrape_configs:
- job_name: '<namespace>-scrape-job'
metrics_path: /metrics
kubernetes_sd_configs:
- role: pod
namespaces:
names:
- <namespace>
tls_config:
insecure_skip_verify: true
relabel_configs:
- source_labels: [__address__]
action: replace
target_label: address
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- action: replace
source_labels:
- __meta_kubernetes_namespace
target_label: Namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: pod_name
- action: replace
source_labels:
- __meta_kubernetes_pod_container_name
target_label: container_name
- action: replace
source_labels:
- __meta_kubernetes_pod_controller_name
target_label: pod_controller_name
- action: replace
source_labels:
- __meta_kubernetes_pod_container_port_name
target_label: port_name
- action: replace
source_labels:
- __meta_kubernetes_pod_container_port_number
target_label: port_number
---
# create cwagent service account and role binding
apiVersion: v1
kind: ServiceAccount
metadata:
name: cwagent-prometheus
namespace: <namespace>
annotations:
eks.amazonaws.com/role-arn: "arn:aws:iam::<primary_aws_account_id>:role/<role_name>"
---
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: cwagent-prometheus-role-binding
subjects:
- kind: ServiceAccount
name: cwagent-prometheus
namespace: <namespace>
roleRef:
kind: ClusterRole
name: cwagent-prometheus-role
apiGroup: rbac.authorization.k8s.io
---
# Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: cwagent-prometheus
namespace: <namespace>
spec:
replicas: 1
selector:
matchLabels:
app: cwagent-prometheus
template:
metadata:
labels:
app: cwagent-prometheus
spec:
containers:
- name: cloudwatch-agent
image: amazon/cloudwatch-agent:latest
imagePullPolicy: Always
resources:
limits:
cpu: 1000m
memory: 1000Mi
requests:
cpu: 200m
memory: 200Mi
# Please don't change below envs
env:
- name: CI_VERSION
value: "k8s/1.3.8"
- name: RUN_WITH_IRSA
value: "True"
# Please don't change the mountPath
volumeMounts:
- name: prometheus-config
mountPath: /etc/prometheusconfig
- name: prometheus-cwagentconfig
mountPath: /etc/cwagentconfig
volumes:
- name: prometheus-config
configMap:
name: prometheus-config
- name: prometheus-cwagentconfig
configMap:
name: prometheus-cwagentconfig
terminationGracePeriodSeconds: 60
serviceAccountName: cwagent-prometheus
Thank you for bringing this issue to our attention.
We found that the root cause is EMF exporter translator missing a statement to pass the RoleARN component from the agent configuration. @SaxyPandaBear linked the PR to address this issue and you can track that PR for the progress.