aws-actions/configure-aws-credentials

Intermittent failures when running configure-aws-credentials

ibrahima opened this issue · 2 comments

Describe the bug

I am getting intermittent failures when trying to use configure-aws-credentials (on both v2 which we were using previously, and v4, which I just tried updating to). The error is

Not authorized to perform sts:AssumeRoleWithWebIdentity

The weird thing is, it started happening when I removed some unrelated GitHub Actions from my workflow. The unrelated actions do not authenticate with or interact with AWS at all. Everything was working fine until today and there were no changes made to the OIDC setup or anything like that. I wonder if it's relate to rate limiting or throttling or something?

Expected Behavior

My GitHub workflow successfully gets AWS credentials.

Current Behavior

I receive the error Not authorized to perform sts:AssumeRoleWithWebIdentity It seems like it tries several times before failing:

Run aws-actions/configure-aws-credentials@v4
Assuming role with OIDC
Assuming role with OIDC
Assuming role with OIDC
Assuming role with OIDC
Assuming role with OIDC
Assuming role with OIDC
Assuming role with OIDC
Assuming role with OIDC
Assuming role with OIDC
Assuming role with OIDC
Assuming role with OIDC
Assuming role with OIDC
Error: Could not assume role with OIDC: Not authorized to perform sts:AssumeRoleWithWebIdentity

Reproduction Steps

I don't have steps to reproduce unfortunately because it's intermittent, and the workflow code itself is proprietary.

The workflow does call aws-actions/configure-aws-credentials 5 times in different jobs... not sure if that's relevant.

This is the code where I am using it, but it seems pretty innocuous:

    - name: Configure AWS credentials
      uses: aws-actions/configure-aws-credentials@v4
      with:
        role-to-assume: ${{ env.ECR_ROLE }}
        aws-region: us-west-2

Possible Solution

Ahhh, I think I found it. The "unrelated" change was adding an environment field to a deployment job in the workflow. This changes the sub field on the OIDC token, as documented in https://docs.github.com/en/actions/deployment/security-hardening-your-deployments/configuring-openid-connect-in-amazon-web-services#configuring-the-role-and-trust-policy. I need to update my IAM policy to allow for environments in that field.

Additional Information/Context

Looking at the CloudTrail logs for AssumeRoleWithWebIdentity, I have noticed something odd. For the ones that fail, it sends only 2 resources in the request:

[
  {
    "resourceType": "AWS::STS::AssumedRole",
    "resourceName": "GitHubActions"
  },
  {
    "resourceType": "AWS::IAM::Role",
    "resourceName": "arn:aws:iam::0123456789:role/my-role"
  }
]

For the ones that succeed, there are 5 resources:

[
  {
    "resourceType": "AWS::IAM::AccessKey",
    "resourceName": "********************"
  },
  {
    "resourceType": "AWS::STS::AssumedRole",
    "resourceName": "GitHubActions"
  },
  {
    "resourceType": "AWS::STS::AssumedRole",
    "resourceName": "arn:aws:sts::0123456789:assumed-role/my-role/GitHubActions"
  },
  {
    "resourceType": "AWS::STS::AssumedRole",
    "resourceName": "*********************:GitHubActions"
  },
  {
    "resourceType": "AWS::IAM::Role",
    "resourceName": "arn:aws:iam::0123456789:role/my-role"
  }
]

I wonder why that could be, and what the difference is.

Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.