prometheus-community/helm-charts

[kube-prometheus-stack] AMP remoteWrite sigv4 autorization error: server returned HTTP status 403 Forbidden

nikolaops opened this issue · 0 comments

Describe the bug a clear and concise description of what the bug is.

Prometheus is not authorized to remotely wirte to AWS APS using sigv4 and assuimg role.

What's your helm version?

v3.10.2

What's your kubectl version?

v1.25.4

Which chart?

[prometheus-kube-stack]

What's the chart version?

61.1.1

What happened?

Trying to configure prometheus-kube-stack.prometheus to remotely write to AWS Managed Prometheus.
I have created a role with permissions suggested at aws doc.

Prometheus container in monitoring prometheus-kube-prometheus-stack-prometheus-0 pod gives this error:

ts=2024-07-03T12:22:54.684Z caller=dedupe.go:112 component=remote level=error remote_name=de7ef6 url=https://aps-workspaces.<REGION>.amazonaws.com/workspaces/<WORKSPACE_ID>/api/v1/remote_write/ msg="non-recoverable error" count=1779 exemplarCount=0 err="server returned HTTP status 403 Forbidden: {\"message\":\"Missing Authentication Token\"}"

where and <WORKSPACE_ID> are set correctly to target aws aps.

I have added this to the values:

prometheus:
  serviceAccount:
    name: "${prometheus_sa}"
    annotations:
      eks.amazonaws.com/role-arn: "${prometheus_role}"
    automountServiceAccountToken: true
  prometheusSpec:
    remoteWrite:
      - url: "https://aps-workspaces.${amp_region}.amazonaws.com/workspaces/${workspace_id}/api/v1/remote_write/"
        sigv4:
          region: "${amp_region}"
          roleArn: "${prometheus_role}"

${prometheus_role} configured via terraform:

resource "aws_iam_role" "prometheus_role" {
  name = "${local.cluster_name}-prometheus-remote-amp-write-role"

  assume_role_policy = jsonencode({
    Version = "2012-10-17",
    Statement = [
      {
        Effect = "Allow",
        Principal = {
          Federated = "arn:aws:iam::${data.aws_caller_identity.current.account_id}:oidc-provider/${data.aws_iam_openid_connect_provider.main.url}"
        },
        Action = "sts:AssumeRoleWithWebIdentity",
        Condition = {
          "StringEquals" = {
            "${data.aws_iam_openid_connect_provider.main.url}:sub" = "system:serviceaccount:monitoring:${local.prometheus_sa }"
          }
        }
      } 
    ]
  })
}

resource "aws_iam_policy" "prometheus_policy" {
  name        = "${local.cluster_name}-prometheus-remote-amp-write-policy"
  description = "Allows Prometheus to write remotely to AMP in us-west-2"

  policy = jsonencode({
    Version = "2012-10-17",
    Statement = [
      {
        Effect = "Allow",
        Action = [
          "aps:RemoteWrite",
          "aps:GetSeries",
          "aps:GetLabels",
          "aps:GetMetricMetadata",
          "aps:QueryMetrics"
        ],
        Resource = "*"
      }
    ]
  })
}

resource "aws_iam_role_policy_attachment" "prometheus_attach" {
  role       = aws_iam_role.prometheus_role.name
  policy_arn = aws_iam_policy.prometheus_policy.arn
}

if i run

kubectl exec -it prometheus-kube-prometheus-stack-prometheus-0 -n monitoring -- cat /var/run/secrets/eks.amazonaws.com/serviceaccount/token

It outputs token. If i decode it, it says that expiration time was yesterday...
Also, serviceaccount that is created upon chart deploy do not have nothing in token section:

Name:                <SA_NAME>
Namespace:           monitoring
Labels:              app=kube-prometheus-stack-prometheus
                     app.kubernetes.io/component=prometheus
                     app.kubernetes.io/instance=kube-prometheus-stack
                     app.kubernetes.io/managed-by=Helm
                     app.kubernetes.io/name=kube-prometheus-stack-prometheus
                     app.kubernetes.io/part-of=kube-prometheus-stack
                     app.kubernetes.io/version=61.1.1
                     chart=kube-prometheus-stack-61.1.1
                     heritage=Helm
                     release=kube-prometheus-stack
Annotations:         eks.amazonaws.com/role-arn: arn:aws:iam::<ACCOUNT_ID>:role/<ROLE_ARN>
                     meta.helm.sh/release-name: kube-prometheus-stack
                     meta.helm.sh/release-namespace: monitoring
Image pull secrets:  <none>
Mountable secrets:   <none>
Tokens:              <none>
Events:              <none>

These are the secrets created by the helm:

NAME                                                                                                  TYPE                 DATA   AGE
alertmanager-kube-prometheus-stack-alertmanager                         Opaque               1      15m
alertmanager-kube-prometheus-stack-alertmanager-generated       Opaque               1      15m
alertmanager-kube-prometheus-stack-alertmanager-tls-assets-0    Opaque               0      15m
alertmanager-kube-prometheus-stack-alertmanager-web-config     Opaque               1      15m
kube-prometheus-stack-admission                                                      Opaque               3      155m
prometheus-kube-prometheus-stack-prometheus                              Opaque               1      15m
prometheus-kube-prometheus-stack-prometheus-tls-assets-0        Opaque               1      15m
prometheus-kube-prometheus-stack-prometheus-web-config         Opaque               1      15m

sh.helm.release.v1.kube-prometheus-stack.v1                                     helm.sh/release.v1   1      15m

Deployed aslo via terraform as:

resource "helm_release" "prometheus_stack" {
  name             = "kube-prometheus-stack"
  repository       = "https://prometheus-community.github.io/helm-charts"
  chart            = "kube-prometheus-stack"
  namespace        = "monitoring"
  create_namespace = true
  version          = "61.1.1"
  values = [
    "${templatefile("${path.module}/template_files/prometheus.yaml.tftpl", {
      apps_locals     = local.apps_locals,
      prometheus_sa   = local.prometheus_sa,
      amp_region      = var.amp_region,
      workspace_id    = var.amp_centralized_workspace_id,
      prometheus_role = local.prometheus_role
    })}"
  ]
}

What you expected to happen?

Metrics remotely wrote to the APS workspace.

How to reproduce it?

No response

Enter the changed values of values.yaml?

No response

Enter the command that you execute and failing/misfunctioning.

terraform apply

Anything else we need to know?

No response