[BUG] pot-stop, called from pot-start, might operate on the wrong jail.
grembo opened this issue · 8 comments
Describe the bug
Calls to pot-stop <jailname>
from pot-start might operate on the wrong jail.
This is part of why #200 fixed nomad problems, as a jail that was restarted, was partially dismantled by pot-stop called from pot-start. With #200 this is less of a problem, as each pot gets a truly unique name. It might be a problem with non-nomad jails though.
It seems like that this might also be the underlying reason why I perceived behavior that made me create #201.
To Reproduce
Steps to reproduce the behavior:
- Create a non-persistent pot using sleep as the pot-cmd
- Start that pot just at the right moment so that the invocations overlap
- Perceive undefined behavior
Expected behavior
- pot-stopped should include the pot's jail id in the pot-stopped file.
- start-cleanup should call pot-cmd with an optional jid parameter
- if the jid parameter is given, pot-stop should only operate on pname if the jid matches
Hi. Thanks again for reporting.
The bug is triggered because the clean-up operations take long enough to allow the same jail to be started again.
In this small period of time, the second start allow the pot to start, because there are non running, but the clean-up operations of the previous start will call a stop that then kill the recently started pot.
I see two possible solutions:
- a sync of some sort, allowing the second start to proceed only after the clean-up of the first one is performed
- a refactor of js_stop: start will use only the second half
I honestly prefer the first solution, to avoid further race conditions with post-stop and pre-start hooks.
An additional observation related to this issue: I've seen a case where parallel cleanup of one pot destroyed an epair interface a second time, which had been re-created to be used by a new pot in the meantime - timing was unfortunate, so the new pot ended up running with its epair interface removed from it post-start. So yes, clean locking is the way forward here.
@pizzamig I found another race in the kernel (13.0 and 13.1)
Looking at the kernel sources, I find it unlikely that the code will race-free anytime soon (it looks more like a rewrite would be required). This means we need some really heavy handed locking.
I created a script to trigger the race mentioned above using pot (this is similar to what happens when using nomad). We can use this to validate our locking solution:
#!/bin/sh
POT="bin/pot"
CLONENUM=5
CLONES="$(jot $CLONENUM)"
if ! $POT show -p cloneme >/dev/null 2>/dev/null; then
$POT create -p cloneme -b 13.0 -t single -N public-bridge
$POT snapshot -p cloneme
fi
for i in $CLONES; do
if ! $POT show -p clone$i >/dev/null 2>/dev/null; then
$POT clone -p clone$i -P cloneme
$POT set-attr -A persistent -V no -p clone$i
$POT set-cmd -c "sleep $(jot -r 1 0.1 0.9)" -p clone$i
fi
done
sync
while true; do
for i in $CLONES; do
($POT start clone$i&
$POT stop clone$i)&
pids="$pids $!"
done
wait $pids
pids=""
done
Thanks for reporting and for the script, it's a good way to reproduce it.
I'll try to implement some mutex/locking mechanism to let start/stop/cleanup to run one at a time
I think the key is to clearly identify all resources belonging to one pot and then lock per pot/resource group. I can support at some point in April.
this is how I envision the solution: first split start
as 3 operations: starting
, started
, post-start-clean-up
When starting
, no stop
or another starting
is allowed
stop
is only allowed when started
, so post-start-clean-up
and stop
needs to be mutually exclusive
starting
cannot happen during post-start-clean-up
and stop
Now, post-start-clean-up
and stop
are almost the same and they can be easily refactored in pot stop
.
The complicated part is implement in sh
the rest of the synchronization.
One idea, is to track the status of the pot
in the config file, or in a status file (starting
, started
, stopping
, stopped
, and whatever is needed...), even if it's not a very atomic operation and moving from started
to stopped
could have a race condition, but if stop
and post-start-clean-up
are equivalent, it shouldn't be an issue. I have an idea on how to use lockf
for it but it could take some time....
This solution should make the `/tmp/pot_stopped{_pname}' file redundant.
stop is only allowed when started, so post-start-clean-up and stop needs to be mutually exclusive
starting cannot happen during post-start-clean-up and stop
I assume all of that would be per pot, right?
I'm always a bit torn, as for some of these things having a daemon would make life easier (and also better in terms of performance), but losing the ability to chickly hack the scriptwork is very useful.
I'll see if I have a chance to do some PoC work on this later in April.