Race condition on setup
Opened this issue · 7 comments
If two nodes are converging at the same time, you can still run into a race condition where one node is checking for a DB setup being complete, finding no setup, and continuing. At the same time a second node is configuring immediately after the first node checks and the first node will then fail when it tries to do the setup again.
maybe this is something we could encode into the checkmate.yml
I would add the DB node setup as a first required step in the workflow.
Or be sure to set up other web nodes only once the first one has started
Since the cron job should be running on only a single node, would it make sense to have a check for the cron recipe to be in the run_list, and if so, run the DB install? I'm not entirely familiar with the install script, so I'm not sure if it does more than DB work.
Also, I would like to solve this outside of checkmate since this is not necessarily a checkmate related issue.
@gondoi We could combine it with the crontab node; our end goal was to stop running the installer and use a script to load the data via checkmate anyway. Since you reported the issue, are you okay waiting for that, or is this race condition regularly breaking checkmate deployments at the moment?
It breaks pretty regularly since we are waiting for the cloud database to finish setting up, and as soon as it's done the magento nodes start the chef run at almost the exact same time and hit the magento setup at the same time. Although, we could probably wait for that since it's not 100% failure and we can retry the configure task once it fails.
Every object in chef is last-writer-wins. There is, at this time (nb: March 2013) , no way to guard against this.
source
Ideally, we can get out of the business of running the magento setup script. I really think that's the best solution short of adding atomicity/concurrency features into chef itself or by writing a new resource that understands locking.