Document the use of CTRL_BREAK_EVENT on Windows instead of SIGINT to interrupt workload executors
prashantmital opened this issue · 2 comments
There are many limitations with using signal.CTRL_C_EVENT
to interrupt a subprocess on Windows. Consider, for example, the following scripts:
pyscript.py
(analogous to 'the framework', i.e.astrolabe
):
import subprocess
import os
import signal
import sys
import time
cmd = subprocess.Popen([sys.executable, "bgproc.py"],
creationflags=subprocess.CREATE_NEW_PROCESS_GROUP,
stdout=subprocess.PIPE, stderr=subprocess.PIPE)
time.sleep(2)
os.kill(cmd.pid, signal.CTRL_C_EVENT)
stdout, stderr = cmd.communicate(timeout=10)
print("stdout: {}".format(stdout))
print("stderr: {}".format(stderr))
print("exit code: {}".format(cmd.returncode))
bgproc.py
(analogous to a driver workload executor script):
import signal
print("hello world")
try:
while True:
pass
except KeyboardInterrupt:
print("caught ctrl-c!")
exit(0)
Running python.ext pyscript.py
, we'd expect to see bgproc.py
's execution interrupted by the CTRL_C_EVENT
signal, which is 'handled' in the except KeyboardInterrupt
block. However, we actually find that interruption of this script is not interrupted at all by the signal causing the call to communicate
to timeout:
$ C:/python/Python37/python.exe pyscript.py
Traceback (most recent call last):
File "pyscript.py", line 13, in <module>
stdout, stderr = cmd.communicate(timeout=10)
File "C:\python\Python37\lib\subprocess.py", line 964, in communicate
stdout, stderr = self._communicate(input, endtime, timeout)
File "C:\python\Python37\lib\subprocess.py", line 1298, in _communicate
raise TimeoutExpired(self.args, orig_timeout)
subprocess.TimeoutExpired: Command '['C:\\python\\Python37\\python.exe', 'bgproc.py']' timed out after 10 seconds
After observing this peculiar behavior, I investigated further and found that on Windows there are many deficiencies with the IPC APIs. The situation is further complicated by deficient/incorrect Python documentation (specifically, the correct usage of CTRL_C_EVENT
, CTRL_BREAK_EVENT
, CREATE_NEW_PROCESS_GROUP
, os.kill
on Windows). Some resources with pertinent information/discussions are:
- https://stefan.sofa-rockers.org/2013/08/15/handling-sub-process-hierarchies-python-linux-os-x/
- https://stackoverflow.com/questions/7085604/sending-c-to-python-subprocess-objects-on-windows
- https://stackoverflow.com/questions/26578799/send-sigint-to-python-subprocess-using-os-kill-as-if-pressing-ctrlc
- https://stackoverflow.com/questions/47306805/signal-sigterm-not-received-by-subprocess-on-windows
In light of this, we need a new way to stop the Workload Executor on windows.
After some more digging, it seems that using the CTRL_BREAK_EVENT
signal is the right way to kill process groups on Windows. The following combination of scripts works:
pyscript.py
analogous to 'the framework', i.e.astrolabe
):
import subprocess
import os
import signal
import sys
import time
cmd = subprocess.Popen([sys.executable, "bgproc.py"],
creationflags=subprocess.CREATE_NEW_PROCESS_GROUP,
stdout=subprocess.PIPE, stderr=subprocess.PIPE)
time.sleep(2)
os.kill(cmd.pid, signal.CTRL_BREAK_EVENT)
stdout, stderr = cmd.communicate(timeout=10)
print("stdout: {}".format(stdout))
print("stderr: {}".format(stderr))
print("exit code: {}".format(cmd.returncode))
bgproc.py
(analogous to a driver workload executor script):
import signal
print("hello world")
def cleanup(signum, frame):
print("caught ctrl-break!")
exit(0)
signal.signal(signal.SIGBREAK, cleanup)
while True:
pass
exit(0)
This works as expected:
$ C:/python/Python37/python.exe pyscript.py
stdout: b'hello world\r\ncaught ctrl-break!\r\n'
stderr: b''
exit code: 0