Test case with picocom channel fails with "some subprocess(es) did not stop"
Closed this issue · 5 comments
Very simple test case fails with error:
$ newbot ts.hello_world.hi -c config.temp_config
tbot starting ...
├─Calling hi ...
│ ├─[local] picocom -b 115200 /dev/ttyUSB1
│ ├─[dm-board-linux] echo Hello World
│ │ ## Hello World
│ ├─test case finished
│ └─Fail. (1.717s)
├─Warning: Found dangling <class 'config.temp_config.DmBoardLinux'> instance in this context
├─Warning: Found dangling <class 'config.temp_config.DmBoard'> instance in this context
├─Warning: Teardown went wrong! A <class 'config.temp_config.DmBoardLinux'> instance is still alive.
│ Please report this to https://github.com/rahix/tbot/issues!
├─Warning: Teardown went wrong! A <class 'config.temp_config.DmBoard'> instance is still alive.
│ Please report this to https://github.com/rahix/tbot/issues!
├─Exception:
│ Traceback (most recent call last):
│ File "/usr/lib/python3.8/contextlib.py", line 510, in exit
│ if cb(*exc_details):
│ File "/usr/lib/python3.8/contextlib.py", line 120, in exit
│ next(self.gen)
│ File "/home/vemelin/.local/lib/python3.8/site-packages/tbot/machine/connector/common.py", line 132, in _connect
│ yield ch
│ File "/home/vemelin/.local/lib/python3.8/site-packages/tbot/machine/channel/channel.py", line 561, in exit
│ self.close()
│ File "/home/vemelin/.local/lib/python3.8/site-packages/tbot/machine/channel/channel.py", line 538, in close
│ self._c.close()
│ File "/home/vemelin/.local/lib/python3.8/site-packages/tbot/machine/channel/subprocess.py", line 130, in close
│ raise tbot.error.TbotException("some subprocess(es) did not stop")
│ tbot.error.TbotException: some subprocess(es) did not stop
├─────────────────────────────────────────
└─FAILURE (1.789s)
code hello_world.py:
import tbot
@tbot.testcase
def hi():
with tbot.ctx.request(tbot.role.BoardLinux) as lnx:
lnx.exec0("echo",
"Hello",
"World")
tbot.log.message(f"test case finished")
code temp_config.py:
import tbot
from tbot.machine import connector, board, linux
class DmBoard(connector.ConsoleConnector, board.Board):
baudrate = 115200
serial_port = "/dev/ttyUSB1"
def connect(self, mach):
return mach.open_channel("picocom", "-b", str(self.baudrate), self.serial_port)
class DmBoardLinux(board.Connector, linux.Ash):
pass
def register_machines(ctx: tbot.Context):
ctx.register(DmBoard, tbot.role.Board)
ctx.register(DmBoardLinux, tbot.role.BoardLinux)
You need to fix it by increasing wait time in
tbot/tbot/machine/channel/subprocess.py
line 118
Why did you limit exponential backoff to 1 sec? It could be greater without affecting anything else
For me for t in range(11):
works perfect. range(7)
not enough.
Hi,
the backoff limit was intentionally chosen to be quite low because usually, when subprocesses don't stop, the correct solution is to add code that cleanly terminates the offending processes. In your case, the solution is probably to send a termination escape sequence to picocom:
def connect(self, mach):
ch = mach.open_channel("picocom", "-b", str(self.baudrate), self.serial_port)
try:
yield ch
finally:
ch.sendcontrol("A")
ch.sendcontrol("Q")
That said, I'd be open to extending the backoff time a bit but displaying a warning after one second that something isn't right.
@Rahix thanks for your suggestion (just contextlib.contextmanager decorator is needed to add). It works, but it works even this way:
@contextlib.contextmanager
def connect(self, mach):
ch = mach.open_channel("picocom", "-b", str(self.baudrate), self.serial_port)
yield ch
Didn't dive deep to it.
And in both cases it can raise tbot.error.ChannelClosedError if spam this test very often. Maybe it's not important but increasing backoff time doesn't have this problem (for t in range(8):
was enough for me)
So maybe worth it to extend backoff time or to add the trick above to documentation. Up to you.
but it works even this way:
Yeah, it is a timing race between tbot's timeout and your picocom process stopping. It will work some days and some days it will fail.
To avoid ugly situations like this, I have taken great care to not make use of timeouts in tbot whereever possible. You ran into one of the few remaining ones...
I have now created #120 to hopefully improve on this. Please let me know if this works for you.
or to add the trick above to documentation
So I think the real solution here is to add a dedicated picocom connector to tbot, which implements this behavior instead of just documenting what people need to do.
And actually, this picocom connector already exists - in a way: The AutoConsoleConnector
from tbot_contrib
includes support for picocom
, using the exact code I showed above. But as I just noticed, this connector never made it into the documentation 🫣 Need to fix that asap...
AutoConsoleConnector
is documented now: https://tbot.tools/contrib/connector.html#tbot_contrib.connector.auto.AutoConsoleConnector