NextChapterSoftware/ec2-action-builder

Adding a start/stop mode to terminate instances as required without waiting for 60 minutes to TTL to expire.

Closed this issue · 3 comments

There should be a start/stop mode for starting the instance when required and stoppping it when its job is done. TTL is good but it won't be useful if my job is done in 5 minutes why would I pay for the rest of 55 minutes as 1 minute is minimum charge per-instance. Please look into this asap, then this action will be unbeatable.

Instances are supposed to terminate about 60 seconds after job completion (failure or success). The 60 seconds grace-period is to give the code a chance for calling GH API in order to perform necessary cleanups.

Are you saying instances stay around after a job has completed ?

      `echo "shutdown -P +1" > $CURRENT_PATH/shutdown_script.sh`,
      "chmod +x $CURRENT_PATH/shutdown_script.sh",
      `echo "./config.sh remove --token ${runnerRegistrationToken.token} || true" > $CURRENT_PATH/shutdown_now_script.sh`,
      `echo "shutdown -h now" > $CURRENT_PATH/shutdown_now_script.sh`,
      "chmod +x $CURRENT_PATH/shutdown_now_script.sh",
      "export ACTIONS_RUNNER_HOOK_JOB_COMPLETED=$CURRENT_PATH/shutdown_script.sh",

The code above is part of our startup cloud-init script.

  • It creates a shutdown script and then uses ACTIONS_RUNNER_HOOK_JOB_COMPLETED to make sure it is executed once a job finishes.
  • We also have github_job_start_ttl_seconds which defines how long an instance is allowed to stay idle before a job is executed
  • Finally we have the instance TTL which would execute if the two options above both fail for any reason.
    I just tested with a job which had an error intentionally introduced to make it fail. Exactly 1 minute after failure the instance was terminated.

Do you have an example of a workflow which could trigger a different type of failure ?

Yes, This works, Thank you so much, I will use this in my workflow.