breaking release V2 to support multi-runners
npalm opened this issue ยท 22 comments
We have implemented mult-runners! With this change on the branch next. we combine other changes that are breaking to the current v1* release. In this issue we tracking all the PR part of the upcoming v2 release. Till that time, we push out v2.0.0-next.* releases for testing purpose. Currently most of the work is finished. We expect to merge all changes back by the end of December.
Once the work of next is finished we will apply the follong changes
- The next branch will be merged back with a merge commit to develop
- Semantic release will be replaced with release-please. This will help to keep a better track on new release via a release branch.
- The develop branches will be removed, main will become the default.
- A branch v1 will be created, we will not actively maintain this branch. But we will do our best to merge PR's for some time.
All the help and feedback is welcome, feel free to raise improvement PR's to the branch next.
- #2472
- #2521
- #2519
- #2569
- #2639
- #2644
- #2685
- #2690
- #2705
- #2736
- #2761
- #2763
- #2784
- #2809
- Gather test feedback
- create V1 branch
- update main REAMDE with breaking changes notice and V1 reference
Migration
Application tagged namespaces
In #2182 the Application tag on the runner is namespace, in this release the support for the old tag is removed (#2705) and only the new tag ghr:Application
is used for scaling down. To ensure that olds instances got removed by the scale-down function it is important to first upgrade to 1.17. Alternative you can stop them manually.
Terraform 1.3+ required
We drop support for Terraform versions between lower then 1.3. This means versions 1.0.x till 1.2.x are not longer supported. The impact of upgrading from any Terraform version above 1 to 1.3.x should be minimal.
SSM Paths changed - Update required for tailored start-runner script (#2569)
Migration is only required when using your tailored start-runner script for example when building your own AMI. Before all parameters were based on convention. We have update them to be predictable with still similar conventions as default.
We set the tag ghr:ssm_config_path
as EC2 instance tag to the path used for the configuration. In the config path we store the same values as before (run_as
, enable_cloudwatch
, agent_mode
). The tokens are stored in a separate location which can be retrieved by looking up the value token_path
in the config.
For a more details example check-out the following diffs:
Multi runner capability (#2519)
- The multi-runner is available as a stand-alone module. However the change introduced a breaking change on the main module. The variable
runner_enable_workflow_job_labels_check
is not available under this module as we now only support the flagenable_runner_workflow_job_labels_check_all
to check all/partial workflow labels to be present in the runner labels. Running TF apply on existing deployment will apply some changes done in internal modules to existing infrastructure. - Impact on using internal (submodules). There are some changes in the webhook internal module as per this feature. So, If you have been using it directly, the changes will be towards accepting runner configuration (queue details and label matchers) as a single object instead of earlier configuration of scattered queue details and criteria to match the workflow labels against the runner labels.
Removal support check_run (#1256
In the past we used for scaling the check_run event, since GitHub had now support for a workflow_job event. Now the event is upport for quite some time. And also support exists on GHES. We will phase out check_run support.
Ensure you configure the GitHub app to subscribe to workflow_job. In case you on an old GHES server version, the only. migration path is to update GHES or remain on old version of the module
Removal old scale-down mechanism
Migration is only required when you are on an older version than 0.19.0.
- Option 1, upgrade to 2.x could cause orphan instances which will not be terminated by the scale-down function, thoses instances needs to be removed manual.
- Option 2, upgrade first to a verions 1.17+ and keep this version running till all instances before the upgrade are terminated. Next upgrade to 2.x.
Default Node version and lambda architecture
The default for node is set to node 18 (LTS) and the default for the lambda architecture is changed from x64 to arm64. This architecture change has no impact on the runners (#2763).
Variables
Depcreated variables are removed from the module.
- environment -> prefix
- instance_type -> instance_types
- market_options -> instance_target_capacity_type
The following variables are renamed for consistently. Old variables names will trigger an error with upgrade instruction.
Looking forward to this change.
First prerelease is out: https://github.com/philips-labs/terraform-aws-github-runner/releases/tag/v2.0.0-next.1
Looking forward to this change.
Do you have time to test the pre-release? Any help to check the branch would be very welcome.
Looking forward to this change.
Do you have time to test the pre-release? Any help to check the branch would be very welcome.
I will try, but I might not have time. I am currently working part-time, as I learn to deal with Parkinson's and loss of vision in one eye.
@npalm Since this will be a major release (in the semver sense), would it be possible to also bump minimum terraform version to 1.3.0+
, to support optional
, so we could make e.g. the runner block_device_mappings
variable more user friendly, by leveraging default values for some object attributes?
@jpalomaki good point, we kept the main module as backward compatible as we can. But agree would help a lot of other refactoring. Feel free to make a small PR to the next branch.
@npalm All right, I will scan the modules for places where a later terraform version might help, and then post a PR
Hey, thanks for the work on this, would you have an idea of an ETA?
Here's the terraform version version bump: #2639. I will do a follow-up PR for the EBS volume mapping variable to use optionals
@npalm Here's the EBS volume mappings variable refactoring: #2664. I think it ought to be backwards-compatible. Do the automated tests cover the usage of this variable already?
Unit tests does not cover this, EBS block mappings is Terraform only and creates a launch template from which the unit tests are not aware. We have currently not automated end-to-end test.
Created another pre-release vor 2.x
https://github.com/philips-labs/terraform-aws-github-runner/releases/tag/v2.0.0-next.2
Create a new release for fixing prefixes in SSM.
https://github.com/philips-labs/terraform-aws-github-runner/releases/tag/v2.0.0-next.3
https://github.com/philips-labs/terraform-aws-github-runner/releases/tag/v2.0.0-next.5
Only 2 PRs out to upgrade node, remove depcrecation and set default lambda arch to arm.
@npalm Minor thing: how about renaming variable enabled_userdata -> enable_userdata
to be consistent with the other enable_*
variables?
Very much looking forward to this update. I likely won't be able to test before the new year (unsure when you're planning on releasing) but if it's still prerelease in a month or so I'd be happy to give it a go ๐
@npalm Minor thing: how about renaming variable enabled_userdata ->
enable_userdata
to be consistent with the otherenable_*
variables?
Thanks for spotting, feel free to raise a PR. I planning to merge next back in the last week of the year.
Good news, the branch is ready to merge. Cleanup up develop amd main.