glenfant/stopit

Problem with very short timeouts, including zero

Opened this issue · 0 comments

I have a tool for which it makes sense to allow a timeout to be specified on the command line. It also makes sense for a user to try a timeout of zero to see what the timeout failure looks like. But this doesn't work: despite not setting swallow_exc to False, the TimeoutException gets raised. Digging into threading.Timer a bit, this appears to be the result of a 'feature', in that a timeout of 0 is treated specially, causing the stop function to be called from inside the call to Timer.start.

This in turn appears to trigger a race condition wrt the ___exit__ clause on BaseTimeout, based on evidence from the following experiment.

The attached code is a trivial use of stopit, with timeout supplied from the command line.

TL;DR
Illustrates the problem, reproduced on various systems. Also discovers a related problem on Windows 10, for very small values of the timeout

How to reproduce

Here are four example input/output pairs, two good to start, then two that illustrate the two problems:
Runs w/o timing out

>: while :; do python3 test_stopit.py 1; done
done 0
done 0
done 0
done 0
done 0
[repeats]

Times out

>: while :; do python3 test_stopit.py 0.01; done
done 2
done 2
done 2
done 2
done 2
[repeats]

Timeout of 0: always raises TimeoutException

>: while :; do python3 test_stopit.py 0; done
Traceback (most recent call last):
  File "test_stopit.py", line 3, in <module>
    with stopit.ThreadingTimeout(float(sys.argv[1])) as to_ctx_mgr:
  File "/usr/local/lib/python3.8/site-packages/stopit/utils.py", line 73, in __enter__
    self.setup_interrupt()
  File "/usr/local/lib/python3.8/site-packages/stopit/threadstop.py", line 60, in setup_interrupt
    self.timer.start()
  File "/usr/lib/python3.8/threading.py", line 857, in start
    self._started.wait()
  File "/usr/lib/python3.8/threading.py", line 558, in wait
    signaled = self._cond.wait(timeout)
  File "/usr/lib/python3.8/threading.py", line 302, in wait
    waiter.acquire()
stopit.utils.TimeoutException
[repeats]

Very small timeout, two different results

>: while :; do python3 test_stopit.py 0.0000000001; done
done 2
done 2
[repeats 5 times]
done 2
Traceback (most recent call last):
  File "test_stopit.py", line 3, in <module>
    with stopit.ThreadingTimeout(float(sys.argv[1])) as to_ctx_mgr:
  File "/usr/local/lib/python3.8/site-packages/stopit/utils.py", line 73, in __enter__
    self.setup_interrupt()
  File "/usr/local/lib/python3.8/site-packages/stopit/threadstop.py", line 60, in setup_interrupt
    self.timer.start()
  File "/usr/lib/python3.8/threading.py", line 857, in start
    self._started.wait()
  File "/usr/lib/python3.8/threading.py", line 558, in wait
    signaled = self._cond.wait(timeout)
  File "/usr/lib/python3.8/threading.py", line 302, in wait
    waiter.acquire()
stopit.utils.TimeoutException
done 2
done 2
Traceback (most recent call last):
 [as above]
stopit.utils.TimeoutException
Traceback (most recent call last):
  [as above]
stopit.utils.TimeoutException
done 2
[etc., _ad nauseum]

The above were run under Cygwin, Python 3.8.10.
Under Ubuntu focal, Python 3.8.10, the zero case fails as above, but the very small timeout always successfully gives done 2.
Likewise with Debian buster, Python 3.7.3 and several other Linux distros.
Under Windows 10, Python 3.8.10 however, the zero case sometimes fails as above (TimeoutException under waitout.acquire), and sometimes trips the assertion error!

AssertionError: state is 2, not 1

For the very small timeout value, we get some successes, i.e. 'done 2', and also some failures, but the failure surfaces at a different point:

>python test_stopit.py 0.0000000001
done 0
Traceback (most recent call last):
  File "D:\work\jar\doi\pdf1\test_stopit.py", line 9, in <module>
    print('done',to_ctx_mgr.state)
stopit.utils.TimeoutException

Conclusions
It would be nice to

  1. Handle a timeout of zero in stopit itself, or at the very least include a warning in the documentation;
  2. Debug the race condition on Windows/Cygwin (I'm assuming the underlying problem is the same), but this probably needs to involve someone familiar with the internals of the threading module...

Test code

import stopit, sys

with stopit.ThreadingTimeout(float(sys.argv[1])) as to_ctx_mgr:
    assert to_ctx_mgr.state == to_ctx_mgr.EXECUTING, \
            "state is %s, not %s"%(to_ctx_mgr.state, to_ctx_mgr.EXECUTING)
    for i in range(1000000):
      if i>0:
        pass

print('done',to_ctx_mgr.state)