Adding a start/stop mode to terminate instances as required without waiting for 60 minutes to TTL to expire.

Question

Adding a start/stop mode to terminate instances as required without waiting for 60 minutes to TTL to expire.

Closed this issue a month ago · 3 comments

Coditas-Sanket-Rajgiri commented 2 months ago

There should be a start/stop mode for starting the instance when required and stoppping it when its job is done. TTL is good but it won't be useful if my job is done in 5 minutes why would I pay for the rest of 55 minutes as 1 minute is minimum charge per-instance. Please look into this asap, then this action will be unbeatable.

Answer 1 · 2024-09-30T17:31:05.000Z

Instances are supposed to terminate about 60 seconds after job completion (failure or success). The 60 seconds grace-period is to give the code a chance for calling GH API in order to perform necessary cleanups.

Are you saying instances stay around after a job has completed ?

Answer 2 · 2024-09-30T17:49:50.000Z

      `echo "shutdown -P +1" > $CURRENT_PATH/shutdown_script.sh`,
      "chmod +x $CURRENT_PATH/shutdown_script.sh",
      `echo "./config.sh remove --token ${runnerRegistrationToken.token} || true" > $CURRENT_PATH/shutdown_now_script.sh`,
      `echo "shutdown -h now" > $CURRENT_PATH/shutdown_now_script.sh`,
      "chmod +x $CURRENT_PATH/shutdown_now_script.sh",
      "export ACTIONS_RUNNER_HOOK_JOB_COMPLETED=$CURRENT_PATH/shutdown_script.sh",

The code above is part of our startup cloud-init script.

It creates a shutdown script and then uses ACTIONS_RUNNER_HOOK_JOB_COMPLETED to make sure it is executed once a job finishes.
We also have github_job_start_ttl_seconds which defines how long an instance is allowed to stay idle before a job is executed
Finally we have the instance TTL which would execute if the two options above both fail for any reason.
I just tested with a job which had an error intentionally introduced to make it fail. Exactly 1 minute after failure the instance was terminated.

Do you have an example of a workflow which could trigger a different type of failure ?

Answer 3 · 2024-10-06T11:18:47.000Z

Yes, This works, Thank you so much, I will use this in my workflow.