bsdpot/pot

[BUG] pot-stop, called from pot-start, might operate on the wrong jail.

grembo opened this issue · 8 comments

Describe the bug
Calls to pot-stop <jailname> from pot-start might operate on the wrong jail.

This is part of why #200 fixed nomad problems, as a jail that was restarted, was partially dismantled by pot-stop called from pot-start. With #200 this is less of a problem, as each pot gets a truly unique name. It might be a problem with non-nomad jails though.

It seems like that this might also be the underlying reason why I perceived behavior that made me create #201.

To Reproduce
Steps to reproduce the behavior:

  1. Create a non-persistent pot using sleep as the pot-cmd
  2. Start that pot just at the right moment so that the invocations overlap
  3. Perceive undefined behavior

Expected behavior

  • pot-stopped should include the pot's jail id in the pot-stopped file.
  • start-cleanup should call pot-cmd with an optional jid parameter
  • if the jid parameter is given, pot-stop should only operate on pname if the jid matches

Hi. Thanks again for reporting.
The bug is triggered because the clean-up operations take long enough to allow the same jail to be started again.
In this small period of time, the second start allow the pot to start, because there are non running, but the clean-up operations of the previous start will call a stop that then kill the recently started pot.

I see two possible solutions:

  • a sync of some sort, allowing the second start to proceed only after the clean-up of the first one is performed
  • a refactor of js_stop: start will use only the second half

I honestly prefer the first solution, to avoid further race conditions with post-stop and pre-start hooks.

An additional observation related to this issue: I've seen a case where parallel cleanup of one pot destroyed an epair interface a second time, which had been re-created to be used by a new pot in the meantime - timing was unfortunate, so the new pot ended up running with its epair interface removed from it post-start. So yes, clean locking is the way forward here.

@pizzamig I found another race in the kernel (13.0 and 13.1)

Looking at the kernel sources, I find it unlikely that the code will race-free anytime soon (it looks more like a rewrite would be required). This means we need some really heavy handed locking.

I created a script to trigger the race mentioned above using pot (this is similar to what happens when using nomad). We can use this to validate our locking solution:

#!/bin/sh

POT="bin/pot"
CLONENUM=5
CLONES="$(jot $CLONENUM)"

if ! $POT show -p cloneme >/dev/null 2>/dev/null; then
  $POT create -p cloneme -b 13.0 -t single -N public-bridge
  $POT snapshot -p cloneme
fi

for i in $CLONES; do
  if ! $POT show -p clone$i >/dev/null 2>/dev/null; then
    $POT clone -p clone$i -P cloneme
    $POT set-attr -A persistent -V no -p clone$i
    $POT set-cmd -c "sleep $(jot -r 1 0.1 0.9)" -p clone$i
  fi
done

sync

while true; do
  for i in $CLONES; do
    ($POT start clone$i&
    $POT stop clone$i)&
    pids="$pids $!"
  done
  wait $pids
  pids=""
done

Thanks for reporting and for the script, it's a good way to reproduce it.
I'll try to implement some mutex/locking mechanism to let start/stop/cleanup to run one at a time

I think the key is to clearly identify all resources belonging to one pot and then lock per pot/resource group. I can support at some point in April.

this is how I envision the solution: first split start as 3 operations: starting, started, post-start-clean-up
When starting, no stop or another starting is allowed
stop is only allowed when started, so post-start-clean-up and stop needs to be mutually exclusive
starting cannot happen during post-start-clean-up and stop

Now, post-start-clean-up and stop are almost the same and they can be easily refactored in pot stop.
The complicated part is implement in sh the rest of the synchronization.
One idea, is to track the status of the pot in the config file, or in a status file (starting, started, stopping, stopped, and whatever is needed...), even if it's not a very atomic operation and moving from started to stopped could have a race condition, but if stopand post-start-clean-up are equivalent, it shouldn't be an issue. I have an idea on how to use lockf for it but it could take some time....

This solution should make the `/tmp/pot_stopped{_pname}' file redundant.

stop is only allowed when started, so post-start-clean-up and stop needs to be mutually exclusive
starting cannot happen during post-start-clean-up and stop

I assume all of that would be per pot, right?

I'm always a bit torn, as for some of these things having a daemon would make life easier (and also better in terms of performance), but losing the ability to chickly hack the scriptwork is very useful.

I'll see if I have a chance to do some PoC work on this later in April.

@pizzamig I can't believe that three months passed since - I started a review on the feature to finally push it forward.