Race condition in `cookbook_preproc.yaml`
Closed this issue · 1 comments
If more than one copy of cookbook_preproc.yaml
runs at the same time, one will win and the other will do a lot of computation and then lose. This is because both will clone the cache, compute a bunch, and then overwrite the cache with their own data.
Only one copy of the workflow can run at a time on each branch/pr, with earlier copies being cancelled when later ones run, so this isn't a complete disaster. But I should still come up with some git fanciness to avoid this.
The problem would be a lot simpler if we were keeping history in the cache, but it should still be fixable.
Easy mitigation: Clone the cache right before deploying changes to minimise the time the race condition is possible.
Proper fix:
- Change cookbook_preproc's concurrency group to just be the workflow and the
DEPLOY_SUBDIR
. This way if two runs are going at the same time, they're definitely writing to different folders of the cache and so won't create merge conflicts. - When it's time to deploy, repeatedly until it succeeds:
- merge in any changes
- collapse the history
git push --force-with-lease
- wait a random amount of time (so that two runs deploying at exactly the same time eventually fall into an order)
Note to self - cookbook_pr_close.yaml
needs to be included in this mitigation/fix.
It might be easier and more robust to have a different tag/branch for each cache, rather than a different folder in the same branch - that plus one concurrency group per branch should solve the race condition without needed to mess around with merges. They could be named _cookbook_data_main
, _cookbook_data_PR15
etc. Would also simplify the cleanup action - it just needs to delete a branch, rather than make a commit.