[BUG] Pots fail to start if recently stopped
davidchisnall opened this issue · 3 comments
Describe the bug
When I rapidly stop and then start a pot (either via an explicit stop or via a blocking pot's controlling process exiting), I get this error:
Failed to create Session: failed to create session with status 409
jail: /tmp/tinirc: failed
I haven't been able to find this text in the installed pot
files, so I don't know where it came from. After a few attempts, it succeeds.
To Reproduce
Steps to reproduce the behavior:
I have a script that looks like this:
while [ -f /var/run/github-runners ] ; do
# Acquire a lock while cloning to prevent races against an update of the base image.
lockf -k /var/run/github-runners.${RUNNER_NAME}.lock pot clone -F -P ${RUNNER_NAME} -p ${RUNNER_CLONE_NAME}
pot start -p ${RUNNER_CLONE_NAME}
pot destroy -p ${RUNNER_CLONE_NAME}
done
The loop runs around 20 times failing to start after the runner exits, then succeeds.
Expected behavior
Telling a por to start should tell the pot to start.
System configuration - if possible
The only change from the sample is to specify my NIC.
$ diff /usr/local/etc/pot/pot.conf /usr/local/etc/pot/pot.conf.sample
33d32
< POT_EXTIF=hn0
$ cat /etc/pf.conf`
nat-anchor pot-nat
rdr-anchor "pot-rdr/*"
$ potnet show -v
17:41:59 [ INFO] Insert network Some(10.192.0.0/10)
17:41:59 [ INFO] Insert broadcast Some(10.192.0.0/10)
17:41:59 [ INFO] Insert gateway Some(10.192.0.1)
17:41:59 [ INFO] Insert dns Some(10.192.0.2)
Network topology:
network : 10.192.0.0/10
min addr: 10.192.0.0
max addr: 10.255.255.255
Addresses already taken:
10.192.0.0
10.192.0.1 default gateway
10.192.0.2 dns
10.255.255.255
Debug information
SystemConf {
zfs_root: Some(
"zroot/pot",
),
fs_root: Some(
"/opt/pot",
),
network: Some(
10.192.0.0/10,
),
netmask: Some(
255.192.0.0,
),
gateway: Some(
10.192.0.1,
),
ext_if: Some(
"hn0",
),
dns_name: Some(
"dns",
),
dns_ip: Some(
10.192.0.2,
),
}
Maybe you could share with us what is in the pot you're starting (and its configuration file)?
This sounds like some rate limiting/max number of simultaneous sessions issue (potentially from github actions?).
p.s.:
The loop runs around 20 times failing to start after the runner exits, then succeeds.
20 times sounds a lot like the github actions' concurrent usage limit, https://docs.github.com/en/actions/reference/usage-limits-billing-and-administration says:
Of course this is speculation, but 409 is definitely not from within jails and or pot, so it must come from some API you're talking to...
The runner isn't starting. Ah, you're right. It looks as if GitHub requires the runner to wait for a period before reconnecting. The error message that I'm seeing is from it, not from pot.