aws/aws-codedeploy-agent

CodeDeploy fails at "DownloadBundle" step

Closed this issue · 16 comments

After the recent AWS CodeDeploy Agent update (version 1.1.1), all of my CodeDeploy deployments have failed at the "DownloadBundle" step.

In the console, this shows an error similar to:

Destination '/opt/codedeploy-agent/deployment-root/abcde123-4567-89edc-baabc-de1234567/d-ABCD1234/deployment-archive/appspec.yml' already exists

This is occurring even when an instance is newly launched, with a fresh install of the CodeDeploy agent. Therefore, when Scaling Out an ASG, the scale-outs fail, if using the new version of the agent. Then, this causes a "scaling loop" to occur.

To rectify this, we have rolled out a downgrade of the agent to use version 1.1.0. This appears to resolve the issue.

From what I can see, this appears to be affecting Linux-based instances (Amazon Linux 1 and 2 from my testing) however, others may be affected as well.

Same issue

As a workaround for your user data you can explicitly install version 1.1.0 of the Agent that does not have the issue. Check our docs:

sudo ./install auto -v releases/codedeploy-agent-1.1.0-4.noarch.rpm

Thanks a lot, but this workaround was already described by the topic starter. Our auto-scaling configuration is too complex and it's a very difficult to apply changes on the go, so we can't use suggested fix. Is it possible to provide estimates for this issue to be solved completely.

@davidreid @siarzukpiatrouski sorry you ran into same issue. We are trying to reproduce the issue but have no luck. Do you mind sharing more information about your setup and help us answer some questions for us to better investigate and fix the issue?

  • How do you install CodeDeploy host agent?
  • Are you using deployment revision from Github or S3?
  • From the StackTrace you are using ZIP bundle type, can you confirm?
  • Is this issue consistent on all instances?
  • If possible, can you share verbose agent log with error? You can follow here to get instruction to turn it on
  • Would you mind sharing high level structure of your unzipped bundle and maybe rough bundle size as well?

Thanks!

Hey @yubangxi

This is what I have seen for far (I work for an MSP within the AWS Partner Network):

  • This appears to affect Linux-based machines (so far I've seen this on Amazon Linux 1/2)
  • All the instances are within an ASG and have Lifecycle Hooks into CodeDeploy
  • All the bundles are ZIP files
  • I've seen this issue across ~8 AWS Accounts (with a fear this will affect more during a scale-out or deployment) and affecting a large number applications (all running within their own ASGs)
  • For most of these applications, appspec.yml is in the root of the zip and further scripts (at each stage) to trigger Ansible, move our customers application into its respective directory and other bootstrapping etc.
  • The bundle sizes vary massively from a few MBs to a couple hundred MB

Initially, we were installing the CodeDeploy Agent (through UserData) as per the docs. So, wget the file from S3, chmod +x and then ./install auto

Now, we have reverted the affected instances to perform a yum install of the version specific rpm. Upon using the older version, CodeDeploy works as expected.

What's strange to me is why CodeDeploy is failing at the DownloadBundle step, with the reason suggesting that the files (the agent itself has put there) shouldn't be there. It seems like the bundle is being downloaded and then the check to see if the directory is non-empty comes in afterwards?

This is happening on both freshly launched instances and existing instances (with previous deployments).

I hope this helps further?

@davidreid thanks for the response. A couple more questions:

  • Will you be able to provide more context from CodeDeploy agent log, e.g. more lines above the error line?
  • I've seen this issue across ~8 AWS Accounts

For those accounts you are seeing impact, with the same deployment bundle, is the deployment always failed with this error? Or the deployment did actually succeed sometimes?

With V1.1.1 update, one update we made is to use linux unzip command to unzip file instead of Ruby zip library when it's available, looks like in this case something wrong while running the unzip command. It will very helpful if you can try unzip -qo (the command agent is running) on you bundle and see anything is wrong. We are not able to reproduce the issue so suspecting it's related specific bundle format/structure.

Hi, @yubangxi

We are using CloudFormation stacks for ASG, and OS is Amazon Linux 1. Bundle to deploy is a ZIP (~250Mb) from S3 with TAR file in (for ex., /opt/codedeploy-agent/deployment-root/----/d-8LB39XKI4/deployment-archive/html/magento.tar)

We solved this issue by changing in stack template from:

"command": "wget --quiet https://s3.amazonaws.com/aws-codedeploy-us-east-1/latest/install -O /tmp/codedeploy_install && /usr/bin/ruby /tmp/codedeploy_install auto",

to:

"command": "wget --quiet https://s3.amazonaws.com/aws-codedeploy-us-east-1/latest/install -O /tmp/codedeploy_install && /usr/bin/ruby /tmp/codedeploy_install auto -v releases/codedeploy-agent-1.1.0-4.noarch.rpm",

Screen Shot 2020-07-11 at 00 58 17

Screen Shot 2020-07-11 at 00 58 32

Update

[test]# unzip -qo bundle.zip 
warning:  stripped absolute path spec from /
mapname:  conversion of  failed

Update 2

[codedeploy-agent(3814)]: master 3814: Spawned child 1/1
[codedeploy-agent(3818)]: On Premises config file does not exist or not readable
[codedeploy-agent(3818)]: InstanceAgent::Plugins::CodeDeployPlugin::CommandExecutor: Archives to retain is: 5}
[codedeploy-agent(3818)]: Version file found in /opt/codedeploy-agent/.version with agent version OFFICIAL_1.1.1-1850_rpm.
[codedeploy-agent(3818)]: [Aws::CodeDeployCommand::Client 200 0.34604 0 retries] poll_host_command(host_identifier:"---REDACTED---")
[codedeploy-agent(3818)]: Version file found in /opt/codedeploy-agent/.version with agent version OFFICIAL_1.1.1-1850_rpm.
[codedeploy-agent(3818)]: [Aws::CodeDeployCommand::Client 200 0.04732 0 retries] put_host_command_acknowledgement(diagnostics:nil,host_command_identifier:"---REDACTED---")
[codedeploy-agent(3818)]: Version file found in /opt/codedeploy-agent/.version with agent version OFFICIAL_1.1.1-1850_rpm.
[codedeploy-agent(3818)]: [Aws::CodeDeployCommand::Client 200 0.192873 0 retries] get_deployment_specification(deployment_execution_id:"---REDACTED---",host_identifier:"---REDACTED---")
[codedeploy-agent(3818)]: Version file found in /opt/codedeploy-agent/.version with agent version OFFICIAL_1.1.1-1850_rpm.
[codedeploy-agent(3818)]: [Aws::CodeDeployCommand::Client 200 0.023097 0 retries] put_host_command_complete(command_status:"Succeeded",diagnostics:{format:"JSON",payload:"{\"error_code\":0,\"script_name\":\"\",\"message\":\"Succeeded\",\"log\":\"\"}"},host_command_identifier:"---REDACTED---")
[codedeploy-agent(3814)]: Started master 3814 with 1 children
[codedeploy-agent(3818)]: Version file found in /opt/codedeploy-agent/.version with agent version OFFICIAL_1.1.1-1850_rpm.
[codedeploy-agent(3818)]: [Aws::CodeDeployCommand::Client 200 0.039388 0 retries] poll_host_command(host_identifier:"---REDACTED---")
[codedeploy-agent(3818)]: Version file found in /opt/codedeploy-agent/.version with agent version OFFICIAL_1.1.1-1850_rpm.
[codedeploy-agent(3818)]: [Aws::CodeDeployCommand::Client 200 0.03909 0 retries] put_host_command_acknowledgement(diagnostics:nil,host_command_identifier:"---REDACTED---")
[codedeploy-agent(3818)]: Version file found in /opt/codedeploy-agent/.version with agent version OFFICIAL_1.1.1-1850_rpm.
[codedeploy-agent(3818)]: [Aws::CodeDeployCommand::Client 200 0.021023 0 retries] get_deployment_specification(deployment_execution_id:"---REDACTED---",host_identifier:"---REDACTED---")
[codedeploy-agent(3818)]: Version file found in /opt/codedeploy-agent/.version with agent version OFFICIAL_1.1.1-1850_rpm.
ERROR [codedeploy-agent(3818)]: InstanceAgent::LinuxUtil: Error extracting zip archive: 2
[codedeploy-agent(3818)]: Version file found in /opt/codedeploy-agent/.version with agent version OFFICIAL_1.1.1-1850_rpm.
[codedeploy-agent(3818)]: [Aws::CodeDeployCommand::Client 200 0.042265 0 retries] put_host_command_complete(command_status:"Failed",diagnostics:{format:"JSON",payload:"{\"error_code\":5,\"script_name\":\"\",\"message\":\"Destination '/opt/codedeploy-agent/deployment-root/---REDACTED---/d-BLSTX0QI4/deployment-archive/html/magento.tar' already exists\",\"log\":\"\"}"},host_command_identifier:"---REDACTED---")
ERROR [codedeploy-agent(3818)]: InstanceAgent::Plugins::CodeDeployPlugin::CommandPoller: Error during perform: Zip::DestinationFileExistsError - Destination '/opt/codedeploy-agent/deployment-root/---REDACTED---/d-BLSTX0QI4/deployment-archive/html/magento.tar' already exists - /opt/codedeploy-agent/vendor/gems/rubyzip-1.1.7/lib/zip/entry.rb:579:in `create_file'

/opt/codedeploy-agent/vendor/gems/rubyzip-1.1.7/lib/zip/entry.rb:154:in `extract'
/opt/codedeploy-agent/vendor/gems/rubyzip-1.1.7/lib/zip/file.rb:301:in `extract'
/opt/codedeploy-agent/lib/instance_agent/plugins/codedeploy/command_executor.rb:391:in `block (2 levels) in unpack_bundle'
/opt/codedeploy-agent/vendor/gems/rubyzip-1.1.7/lib/zip/entry_set.rb:42:in `call'
/opt/codedeploy-agent/vendor/gems/rubyzip-1.1.7/lib/zip/entry_set.rb:42:in `block in each'
/opt/codedeploy-agent/vendor/gems/rubyzip-1.1.7/lib/zip/entry_set.rb:41:in `each'
/opt/codedeploy-agent/vendor/gems/rubyzip-1.1.7/lib/zip/entry_set.rb:41:in `each'
/opt/codedeploy-agent/vendor/gems/rubyzip-1.1.7/lib/zip/central_directory.rb:182:in `each'
/opt/codedeploy-agent/lib/instance_agent/plugins/codedeploy/command_executor.rb:388:in `block in unpack_bundle'
/opt/codedeploy-agent/vendor/gems/rubyzip-1.1.7/lib/zip/file.rb:99:in `open'
/opt/codedeploy-agent/lib/instance_agent/plugins/codedeploy/command_executor.rb:387:in `rescue in unpack_bundle'
/opt/codedeploy-agent/lib/instance_agent/plugins/codedeploy/command_executor.rb:384:in `unpack_bundle'
/opt/codedeploy-agent/lib/instance_agent/plugins/codedeploy/command_executor.rb:113:in `block in <class:CommandExecutor>'
/opt/codedeploy-agent/lib/instance_agent/plugins/codedeploy/command_executor.rb:68:in `execute_command'
/opt/codedeploy-agent/lib/instance_agent/plugins/codedeploy/command_poller.rb:115:in `process_command'
/opt/codedeploy-agent/lib/instance_agent/plugins/codedeploy/command_poller.rb:97:in `acknowledge_and_process_command'
/opt/codedeploy-agent/lib/instance_agent/plugins/codedeploy/command_poller.rb:76:in `block in perform'
/opt/codedeploy-agent/vendor/gems/concurrent-ruby-1.0.5/lib/concurrent/executor/ruby_thread_pool_executor.rb:348:in `call'
/opt/codedeploy-agent/vendor/gems/concurrent-ruby-1.0.5/lib/concurrent/executor/ruby_thread_pool_executor.rb:348:in `run_task'
/opt/codedeploy-agent/vendor/gems/concurrent-ruby-1.0.5/lib/concurrent/executor/ruby_thread_pool_executor.rb:337:in `block (3 levels) in create_worker'
/opt/codedeploy-agent/vendor/gems/concurrent-ruby-1.0.5/lib/concurrent/executor/ruby_thread_pool_executor.rb:320:in `loop'
/opt/codedeploy-agent/vendor/gems/concurrent-ruby-1.0.5/lib/concurrent/executor/ruby_thread_pool_executor.rb:320:in `block (2 levels) in create_worker'
/opt/codedeploy-agent/vendor/gems/concurrent-ruby-1.0.5/lib/concurrent/executor/ruby_thread_pool_executor.rb:319:in `catch'
/opt/codedeploy-agent/vendor/gems/concurrent-ruby-1.0.5/lib/concurrent/executor/ruby_thread_pool_executor.rb:319:in `block in create_worker'
/opt/codedeploy-agent/vendor/gems/logging-1.8.2/lib/logging/diagnostic_context.rb:323:in `call'
/opt/codedeploy-agent/vendor/gems/logging-1.8.2/lib/logging/diagnostic_context.rb:323:in `block in create_with_logging_context'

WARN [codedeploy-agent(3818)]: InstanceAgent::Plugins::CodeDeployPlugin::CommandPoller: Calling PutHostCommandComplete: "Code Error"
[codedeploy-agent(3818)]: Version file found in /opt/codedeploy-agent/.version with agent version OFFICIAL_1.1.1-1850_rpm.
[codedeploy-agent(3818)]: [Aws::CodeDeployCommand::Client 200 0.020036 0 retries] put_host_command_complete(command_status:"Failed",diagnostics:{format:"JSON",payload:"{\"error_code\":5,\"script_name\":\"\",\"message\":\"Destination '/opt/codedeploy-agent/deployment-root/---REDACTED---/d-BLSTX0QI4/deployment-archive/html/magento.tar' already exists\",\"log\":\"\"}"},host_command_identifier:"---REDACTED---")
No newer events at this moment.

@yubangxi

For those accounts you are seeing impact, with the same deployment bundle, is the deployment always failed with this error? Or the deployment did actually succeed sometimes?

Sorry, let me clarify. The deployment bundles are different between every account and every application. They failed consistently between all.

In regards to the CodeDeploy Agent logs, I have pulled these from earlier today, in CloudWatch: https://p.lee.io/9e683bae-8729-402a-9d6a-3237d660a4f5#encryptionKey=R0dr6sbqOeNe97UuZDPHKI3P

In regards to the unzip -qo - this looks to succeed with the same output as @siarzukpiatrouski

Agent version 1.1.1 is rolled back

Agent version 1.1.1 is rolled back

Have attempted a scale out on an ASG using the method that was causing a problem:

wget https://aws-codedeploy-$REGION.s3.amazonaws.com/latest/install 
chmod +x ./install 
./install

...and can confirm it now installs version 1.1.0, and deployments occur without error.

Meantime, some communication via the support tickets that have been raised for this in the AWS Console would be much appreciated. Also while a rollback looks to have been completed on your end, your documentation does not reflect this.

Agreed with @leytonreed

The install script in S3 now installs version 1.1.0, and deployments succeed.

Similarly, I saw a number of "Action Required" sent to the various affected AWS Accounts, that had used version 1.1.1. However, the CodeDeploy Documentation doesn't make it clear that you have rolled back the version.

@davidreid @siarzukpiatrouski Thanks for your information. As a followup to @yubangxi's questions, could you let us know how is the bundle zipped, in the sense, what commands have been used to zip the bundle?

@siarzukpiatrouski @davidreid @leytonreed thanks for your help in assisting our diagnosis of the issue.

After much analysis, while it looks similar to a previous CodeDeploy agent issue, its actually a much more nuanced edge case that's resulting in a similar behavior.

The core issue is that the agent is running into malformed / non-standard zip binaries during the DownloadBundle step and its resulting in a partial unpack or full failure of the unpack. When the scenario occurs, we have some failover logic that attempts to salvage the situation and that logic was actually generating a failure under these specific conditions.

We believe we have a good solution to the issue and we're working to prepare a v1.1.2 release of the agent that addresses it.

Would any of you be interested in helping us test a pre-release of v1.1.2 to validate that it addresses the issue you from your perspective?

@brblck - I certainly would be interested in testing the pre-release of v1.1.2, yes.

In regards to your comment, and @AnandarajuCS, the .zip files in this case were/are generated by Atlassian Bamboo so, it's interesting to hear that the deployment bundles could be malformed/non-standard.

Aside from this, I'm certainly happy to test!

@davidreid thanks so much. The team is working to prepare a pre-release and validation instructions as we speak.