department-of-veterans-affairs/abd-vro

Set up scheduled SecRel job

Closed this issue · 1 comments

User Story

As a VRO engineer, I would like for the SecRel process to run as a scheduled task every weekday on the latest code in the dev branch so that I know whether there is intervention needed in order to get any of the images in VRO signed, in the interest of keeping the deployment pipeline unblocked.

Acceptance Criteria

  1. By latest above ^ we mean we mean the tip of the develop branch.
  2. The SecRel signing workflow is automatically invoked once a day, every weekday.
  3. The results of the workflow are discoverable to VRO engineers.
  4. This is happening in github actions
  5. Slack notification is sent when there are issues

Not included in this work

  • Alerting conditional on outcome of the workflow

Notes about work
this is a follow up to the deployment improvement workshop (recap)

  • Mason has suggested that whoever works on this will have to dig into "latest" tag on the image and how it does not correspond to latest image in the repo

Example of a scheduled run being performed, starting around 07:00 ET: https://github.com/department-of-veterans-affairs/abd-vro-internal/actions/runs/9711795641.

For any future modifications to the time which the scan is performed, in "secrel.yml" there is a new section on line 11 titled "schedule" which contains a cron expression to schedule the runs at 1100 UTC, M-F.

The implementation was complicated due to CI failures with ep-merge, which have been intermittent for some time now. The team incorrectly diagnosed the failures as being associated with the recent monitoring code, which was reinforced by CI tests passing when the feature was reverted in a test branch. When the monitoring code was refactored, the CI tests still passed but then failed on a subsequent run when the commits were merged into develop. After further investigation, we discovered the failures were due to the ep-merge end to end tests were simply timing out after 1 second while calling svc-bip-api; as a result we adjusted the timeout to 2 seconds, and have not seen further CI failures.

It should be noted that, according to the documentation, "When the last user to commit to the cron schedule of a workflow is removed from the organization, the scheduled workflow will be disabled. If a user with write permissions to the repository makes a commit that changes the cron schedule, the scheduled workflow will be reactivated."