appleboy/lambda-action

ResourceNotReady with 0.1.7

marchof opened this issue ยท 67 comments

Hi, after I saw #46 marked as fixed I tried the latest version 0.1.7.

Now the action hangs for several minutes and fails with the following output:

2023/04/01 10:27:10 ResourceNotReady: exceeded wait attempts
2023/04/01 10:27:10 ResourceNotReady: exceeded wait attempts

Anything I can do from my side to debug or fix this?

I will take it.

@marchof Can you show detailed information on how to reproduce the problem?

Hi @appleboy, it is this action: https://github.com/marchof/io.javaalmanac.sandbox/blob/master/.github/workflows/cd.yml#L114

It fails as soon as I use the 0.1.7 tag. This is the effective configuration which is printed to the build log:

  with:
    aws_access_key_id: ***
    aws_secret_access_key: ***
    aws_region: ***
    function_name: jdk-sandbox-17
    image_uri: ***.dkr.ecr.***.amazonaws.com/javaalmanac/sandbox:lambda-latest-17
    publish: true
    memory_size: 0
    timeout: 0
  env:
    ECR_REPOSITORY: javaalmanac/sandbox
    LATEST_TAG: lambda-latest-17
    AWS_DEFAULT_REGION: ***
    AWS_REGION: ***
    AWS_ACCESS_KEY_ID: ***
    AWS_SECRET_ACCESS_KEY: ***

@appleboy Let me know if I can test or debug something.

@marchof can you also help to try appleboy/lambda-action@v0.1.5 version?

I am experiencing the same issue after upgrading to v0.1.7. Build hangs up for about 5 minutes and fails
with ResourceNotReady: exceeded wait attempts error.
It works successfully with v0.1.5.

- name: Deploy on AWS lambda
  uses: appleboy/lambda-action@v0.1.7
  with:
      aws_access_key_id: ${{ secrets.AWS_ACCESS_KEY_ID }}
      aws_secret_access_key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
      aws_region: ${{ secrets.AWS_REGION }}
      zip_file: bundle.zip
      function_name: ***
      handler: src/serverless.handler
      memory_size: 256
      timeout: 10

@marchof can you also help to try appleboy/lambda-action@v0.1.5 version?

I was test with 0.1.5, receive error
ResourceConflictException: The operation cannot be performed at this time. An update is in progress

With 0.1.6 and 0.1.7 get error
ResourceNotReady: exceeded wait attempts

@aleon68 @lorenzopolidori @marchof Please help to try it out.

- uses: appleboy/lambda-action@v0.1.7
+ uses: appleboy/lambda-action@030df7b8106f9a2563919cf647b7aa7c5412a425
  with:
      aws_access_key_id: ${{ secrets.AWS_ACCESS_KEY_ID }}
      aws_secret_access_key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
      aws_region: ${{ secrets.AWS_REGION }}
      zip_file: bundle.zip
      function_name: ***
      handler: src/serverless.handler
      memory_size: 256
      timeout: 10
+     max_attempts: 200

Thanks.

I tried again with:

uses: 030df7b
with:
......
max_attempts: 200

But after more than 6 min still running

The default duration is five seconds. CI will get fail after 200 * 5 seconds.

This is my yml:

uses: 030df7b
with:
aws_access_key_id: ${{ env.AWS_ACCESS }}
aws_secret_access_key: ${{ env.AWS_SECRET }}
aws_region: ${{ env.AWS_REGION }}
s3_bucket: ${{ env.AWS_S3_BUILD }}
function_name: ${{ matrix.lambda-name }}
zip_file: ${{ matrix.path-file }}/zipFile.zip
handler: ${{ matrix.lambda-name }}
description: ...
environment: ...
timeout: 90
memory_size: 512
runtime: go1.x
max_attempts: 10

      This fail after 2 min
      2023/04/02 04:16:24 ResourceNotReady: exceeded wait attempts

Sorry, not 2 min, 47 seconds on deploy

Please update the max_attempts to 200 until the resource ready for an update

@aleon68 I will check the AWS timeout issue.

Please update the max_attempts to 200 until the resource ready for an update

Yeah, I tried recently, and after 16 min 45 sec same fail

@aleon68 I will check the AWS timeout issue.

Thanks

@aleon68

Yeah, I tried recently, and after 16 min 45 sec same fail

I think it is the correct timeout value of 1005 seconds. Can you help to update the max_attempts to 500 and test again? I research many posts that the root cause is adjusting the max_attempts value to higher than the default value 60.

reference: hashicorp/packer#6177

Let me try

Same result
After 42m 18s
2023/04/02 05:52:16 ResourceNotReady: exceeded wait attempts
2023/04/02 05:52:16 ResourceNotReady: exceeded wait attempts

@aleon68 I found the troubleshooting guide:

ResourceNotReadyException

Lambda reclaims network interfaces that aren't in use. This action can place a function in an inactive state. When a function that is inactive is invoked, the function enters a pending state while VPC network access is restored. The first invocation and all others that occur while the function is in a pending state fail and then produce a ResourceNotReadyException error.

To resolve the error, wait until the VPC connection is restored. Then, invoke the Lambda function again.

See https://repost.aws/knowledge-center/lambda-troubleshoot-invoke-error-502-500

OK, I understand, but how can solve this? I have all lambdas on a VPC, is need to delete vpc before update?

@appleboy this error is related to invoking lambdas, not on the update, or I'm wrong?

@aleon68 We need to check the lambda function state is Successful not in Failedor InProgress to avoid the problem before updating the configuration again.

I will try to reproduce the problem.

I can reproduce the following error:

image

2023/04/02 14:26:19 ResourceConflictException ResourceConflictException: The operation cannot be performed at this time. An update is in progress for resource: arn:aws:lambda:ap-southeast-1:502946233425:function:gorush
{
RespMetadata: {
StatusCode: 409,
RequestID: "e6445013-8af6-4586-ba28-68a09bb235a6"
},
Message_: "The operation cannot be performed at this time. An update is in progress for resource: arn:aws:lambda:ap-southeast-1:502946233425:function:gorush",
Type: "User"
}
2023/04/02 14:26:19 ResourceConflictException: The operation cannot be performed at this time. An update is in progress for resource: arn:aws:lambda:ap-southeast-1:502946233425:function:gorush
{
RespMetadata: {
StatusCode: 409,
RequestID: "e6445013-8af6-4586-ba28-68a09bb235a6"
},
Message_: "The operation cannot be performed at this time. An update is in progress for resource: arn:aws:lambda:ap-southeast-1:502946233425:function:gorush",
Type: "User"
}

but can't see the ResourceNotReady error

@aleon68 Please help to try the following version again.

update correct path

appleboy/lambda-action@2ec8254c30163468edbb35fc776836c6b12494ef

Ok, I'll try

Is the correct hash?

unable to find version 02ec8254c30163468edbb35fc776836c6b12494ef

@aleon68 let me try it.

@aleon68

appleboy/lambda-action@2ec8254c30163468edbb35fc776836c6b12494ef

Ok, I'll try

It's working now!!!!!!

Thanks a lot @appleboy

@aleon68 I will bump the new version later. Thanks for helping with the testing.

@aleon68 I will bump the new version later. Thanks for helping with the testing.

Thanks to you for the support

Looks like v0.1.8 does not solve the problem for me:

2023/04/02 12:54:36 ResourceNotReady: exceeded wait attempts
2023/04/02 12:54:36 ResourceNotReady: exceeded wait attempts

My configuration is:

 with:
    aws_access_key_id: ${{ secrets.AWS_ACCESS_KEY_ID }}
    aws_secret_access_key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
    aws_region: ${{ secrets.AWS_REGION }}
    function_name: xxx
    image_uri: ${{ steps.login-ecr.outputs.registry }}/${{ env.ECR_REPOSITORY }}:${{ env.LATEST_TAG }}

The last version which works for me is 1e05c13

Bump to new version https://github.com/appleboy/lambda-action/releases/tag/v0.1.8

With v0.1.8 I get a previous error:

ResourceConflictException: The operation cannot be performed at this time. An update is in progress for resource: ***

And with release 2ec8254 now get same error (ResourceConflictException) using exactly same deploy (this release work before, now is not working)

@aleon68 @marchof

Please help to try the following commit sha:

appleboy/lambda-action@390dab2546e6c97ca3b94fc5f3863d0e15bec0ee

Post the logs:

2023/04/03 03:07:36 Current State: Active
2023/04/03 03:07:36 Last Update Status: InProgress
2023/04/03 03:07:36 Last Update Status Reason: The function is being created.
2023/04/03 03:07:36 Last Update Status ReasonCode: Creating

@appleboy I was tried, but with same failed

image

@aleon68 try again. I have updated the new version.

Ok, I will try

You will see the log below

image

Yeah, I see all logs now:

Run 390dab2
2023/04/03 04:25:08 Update function configuration ...
2023/04/03 04:25:08 Current State: Active
2023/04/03 04:25:08 Last Update Status: Successful
2023/04/03 04:25:09 Update function code ...
2023/04/03 04:25:09 Current State: Active
2023/04/03 04:25:09 Last Update Status: InProgress
2023/04/03 04:25:09 Last Update Status Reason: The function is being created.
2023/04/03 04:25:09 Last Update Status ReasonCode: Creating
2023/04/03 04:25:09 Waiting Last Update Status to be successful ...

And work fine!!!!

Thanks again @appleboy

@marchof Please help to try it out. Waiting for your response. Thanks.

@appleboy Sorry for the late answer. The result with appleboy/lambda-action@390dab2546e6c97ca3b94fc5f3863d0e15bec0ee is:

2023/04/03 09:52:25 Update function configuration ...
2023/04/03 09:52:26 AccessDeniedException: User: arn:aws:iam::***:user/javaalmanac-ecr-upload is not authorized to perform: lambda:GetFunctionConfiguration on resource: arn:aws:lambda:***:***:function:jdk-sandbox-16 because no identity-based policy allows the lambda:GetFunctionConfiguration action
	status code: 403, request id: dfe87caf-3775-4348-bf42-e5efb6d09470

I assume additional permissions are now required. Will add them.

@marchof Thanks for the reminder. I will update the readme.

I tried appleboy/lambda-action@390dab2546e6c97ca3b94fc5f3863d0e15bec0ee again now with the additional permission lambda:GetFunctionConfiguration. It fails after a bit more than 5 minutes with:

2023/04/03 10:06:50 Update function configuration ...
2023/04/03 10:06:51 Current State: Active
2023/04/03 10:06:51 Last Update Status: Successful
2023/04/03 10:06:52 Update function code ...
2023/04/03 10:06:52 Current State: Active
2023/04/03 10:06:52 Last Update Status: InProgress
2023/04/03 10:06:52 Last Update Status Reason: The function is being created.
2023/04/03 10:06:52 Last Update Status ReasonCode: Creating
2023/04/03 10:06:52 Waiting Last Update Status to be successful ...
2023/04/03 10:12:21 ResourceNotReady: exceeded wait attempts
2023/04/03 10:12:21 ResourceNotReady: exceeded wait attempts

Maybe I should mention that my action tries to update an existing function.

@marchof Can you update the max_attempts to 600 or more to wait for the Last Update Status to be successful?

PS. 600 unit is second.

I tried

max_attempts: 1000

Now the same failure happens after 18min:

2023/04/03 10:21:47 Update function configuration ...
2023/04/03 10:21:47 Current State: Active
2023/04/03 10:21:47 Last Update Status: Successful
2023/04/03 10:21:48 Update function code ...
2023/04/03 10:21:48 Current State: Active
2023/04/03 10:21:48 Last Update Status: InProgress
2023/04/03 10:21:48 Last Update Status Reason: The function is being created.
2023/04/03 10:21:48 Last Update Status ReasonCode: Creating
2023/04/03 10:21:48 Waiting Last Update Status to be successful ...
2023/04/03 10:40:09 ResourceNotReady: exceeded wait attempts
2023/04/03 10:40:09 ResourceNotReady: exceeded wait attempts

@marchof you just update image URI?

image_uri: ${{ steps.login-ecr.outputs.registry }}/${{ env.ECR_REPOSITORY }}:${{ env.LATEST_TAG }}

maybe I can try the scenario.

I will try your config

https://github.com/marchof/io.javaalmanac.sandbox/blob/c0380b57214b60ba3dfcd9e39e9eb47d71df4c7e/.github/workflows/cd.yml#L113-L121

    - name: Update Lambda
      uses: appleboy/lambda-action@390dab2546e6c97ca3b94fc5f3863d0e15bec0ee
      with:
        aws_access_key_id: ${{ secrets.AWS_ACCESS_KEY_ID }}
        aws_secret_access_key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
        aws_region: ${{ secrets.AWS_REGION }}
        function_name: jdk-sandbox-${{ matrix.jdk.version }}
        image_uri: ${{ steps.login-ecr.outputs.registry }}/${{ env.ECR_REPOSITORY }}:${{ env.LATEST_TAG }}
        max_attempts: 1000
- name: default deploy
        uses: appleboy/lambda-action@master
        with:
          aws_access_key_id: ${{ secrets.STAGE_DEPLOYER_ACCESS_KEY_ID }}
          aws_secret_access_key: ${{ secrets.STAGE_DEPLOYER_ACCESS_KEY }}
          aws_region: eu-central-1
          function_name: ${{ secrets.FUNCTION_NAME }}
          zip_file: deploy.zip

I canclelled after 9 mins.

Screenshot 2023-04-03 at 20 11 37

@marchof you just update image URI?

Actually not even this: The image URL is still the same. I want the lambda to fetch the latest version.

@marchof @frasulov Can you grant the lambda admin permission to the role? I think may be the permission issue.

// ErrCodeResourceNotReadyException for service response error code
// "ResourceNotReadyException".
//
// The function is inactive and its VPC connection is no longer available. Wait
// for the VPC connection to reestablish and try again.
ErrCodeResourceNotReadyException = "ResourceNotReadyException"

AWSLambda_FullAccess โ€“ Grants full access to Lambda actions and other AWS services used to develop and maintain Lambda resources. This policy was created by scoping down the previous policy AWSLambdaFullAccess.

https://docs.aws.amazon.com/lambda/latest/dg/access-control-identity-based.html
https://docs.aws.amazon.com/lambda/latest/dg/troubleshooting-invocation.html

- name: default deploy
        uses: appleboy/lambda-action@master
        with:
          aws_access_key_id: ${{ secrets.STAGE_DEPLOYER_ACCESS_KEY_ID }}
          aws_secret_access_key: ${{ secrets.STAGE_DEPLOYER_ACCESS_KEY }}
          aws_region: eu-central-1
          function_name: ${{ secrets.FUNCTION_NAME }}
          zip_file: deploy.zip

I canclelled after 9 mins.

Screenshot 2023-04-03 at 20 11 37

Same error here with same configuration @appleboy
I will revert to commit 38916311abc1205578476b7abc5a0586627e8359 until the issue will be resolved...

@appleboy With AWSLambda_FullAccess it just works in within 8 seconds ๐Ÿ‘

So it looks like a error message is suppressed about missing permissions. Any chance to get that message? I would really prefer to have a more specific access policy.

@appleboy With AWSLambda_FullAccess it just works in within 8 seconds ๐Ÿ‘

So it looks like a error message is suppressed about missing permissions. Any chance to get that message? I would really prefer to have a more specific access policy.

I can confirm, lambda:* fixed

@daniele-sarnari-blinkoo the https://github.com/appleboy/lambda-action/releases/tag/v0.1.5 version is working for you, right?

Version 0.1.5 working here as well. Using 0.1.9 I got the same error as others above:

image

My config is on 0.1.9 was:

  - name: Deploy
    id: deploy
    continue-on-error: true
    uses: appleboy/lambda-action@v0.1.5
    with:
      aws_access_key_id: ${{ secrets.PRD_AWS_ACCESS_KEY_ID }}
      aws_secret_access_key: ${{ secrets.PRD_AWS_SECRET_ACCESS_KEY }}
      aws_region: us-east-1
      function_name: ${{ env.MODULE }}
      zip_file: output.zip
      # dry_run: true
      debug: true
      max_attempts: 60

@paulo-gurjao Please also try to grant permission to lambda:*. See #58 (comment)

We ran into this issue and it looks like it was because we were missing the lambda:GetFunction. Could you add this to the Readme required permissions?

@marchof @daniele-sarnari-blinkoo Please help to verify @exiareinert-hpa comment as above.

I will do next week, when back from vacation (not laptop with me, sorry)

@marchof @daniele-sarnari-blinkoo Please help to verify @exiareinert-hpa comment as above.

I can confirm, lambda:GetFunction fixed the issue.

Pull request linked, @appleboy you can merge and close the issue

@daniele-sarnari-blinkoo Thanks for your confirmation.