python/cpython

time.sleep (floatsleep()) should use clock_nanosleep() on Linux

shankarunni opened this issue · 36 comments

BPO 21302
Nosy @vstinner, @4kir4, @1st1, @eryksun, @Livius90
PRs
  • #28077
  • #28111
  • #28311
  • #28341
  • #28350
  • #28483
  • #28526
  • #28545
  • Files
  • wait.py
  • bench.py
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2021-09-25.12:37:55.425>
    created_at = <Date 2014-04-18.20:12:03.043>
    labels = ['type-bug', 'library', '3.11']
    title = 'time.sleep (floatsleep()) should use clock_nanosleep() on Linux'
    updated_at = <Date 2021-10-11.08:30:28.782>
    user = 'https://bugs.python.org/shankarunni'

    bugs.python.org fields:

    activity = <Date 2021-10-11.08:30:28.782>
    actor = 'vstinner'
    assignee = 'none'
    closed = True
    closed_date = <Date 2021-09-25.12:37:55.425>
    closer = 'vstinner'
    components = ['Library (Lib)']
    creation = <Date 2014-04-18.20:12:03.043>
    creator = 'shankarunni'
    dependencies = []
    files = ['50289', '50294']
    hgrepos = []
    issue_num = 21302
    keywords = ['patch']
    message_count = 32.0
    messages = ['216799', '216814', '217225', '217226', '217227', '217233', '217235', '401108', '401701', '401706', '401824', '402279', '402437', '402438', '402440', '402441', '402442', '402448', '402618', '402619', '402630', '402632', '403528', '403544', '403550', '403568', '403569', '403581', '403583', '403610', '403632', '403633']
    nosy_count = 7.0
    nosy_names = ['vstinner', 'akira', 'python-dev', 'yselivanov', 'eryksun', 'shankarunni', 'Livius']
    pr_nums = ['28077', '28111', '28311', '28341', '28350', '28483', '28526', '28545']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue21302'
    versions = ['Python 3.11']

    I know that an earlier request to use nanosleep() has been rejected as "wontfix", but I'm filing this one for a different reason.

    Today, timemodule.c:floatsleep() calls select() on platforms that support it. On Linux, select() with a timeout has an unfortunate property that it is very sensitive to clock jumps, because it computes a sleep end time based on the current kernel timestamp.

    If the system clock is yanked back (by ntpd, or other processes), then the process can end up sleeping for a very long time. (E.g. if the clock is yanked back by half an hour while we are in the middle of, say, a sleep(10), then the process will sleep until "original_kernel_clock+10", which will turn into a half-hour sleep.

    Yes, systems shouldn't jerk their clocks around, but we can't often control this sort of thing on end-user environments.

    Using clock_nanosleep(CLOCK_MONOTONIC, 0, <timespec>, NULL) makes the sleep a much more reliable thing, and mostly insensitive to such jumps. (It'll still be affected by any adjtime(), but that's OK in this case).

    I'm working on a patch, but I noticed a similar issue in Condition.wait(), which also keeps re-evaluating the "remaining sleep time" based on the current kernel clock, with similar effects.

    I'll try to address both issues, or we could open a separate bug for the latter..

    I know that an earlier request to use nanosleep() has been rejected as "wontfix"

    It was the issue bpo-13981. I created this issue while I worked on the PEP-410 (nanosecond timestamp). I closed the issue myself, it doesn't mean that Python must not use the function, just that I didn't want to work on it anymore at this time.

    "I'm working on a patch, but I noticed a similar issue in Condition.wait(), which also keeps re-evaluating the "remaining sleep time" based on the current kernel clock, with similar effects."

    I see that Lock.acquire(timeout) uses the C function gettimeofday() to recompute the timeout if acquiring the lock was interrupted (C error "EINTR"). It would be better to use a monotonic clock here, but please open a new issue because it's unrelated to nanosleep().

    Or did you another bug?

    By the way, you didn't mention the Python version. Are you working on Python 2.7 or 3.5?

    See also the PEP-418.

    If you want to modify time.sleep(), you must be careful of the portability: Windows, Linux, but also Mac OS X, FreeBSD, Solaris, etc.

    Try to describe the behaviour of each underlying C function on each platform to be able to describe the "portable behaviour" on all platforms, especially the expected behaviour when the system clock is changed (is time.sleep impacted or not? always?) and the expected behaviour when the system is suspended.

    For example, it looks like nanosleep() uses a different clock depending on OS (Linux uses CLOCK_MONOTONIC, other UNIX platforms use CLOCK_REALTIME).
    http://lists.gnu.org/archive/html/bug-coreutils/2012-08/msg00087.html

    I know that you suggest to use clock_nanosleep(), but this function is not available on all platforms. For example, I would not use it on Windows.

    Another example (on Fedora?): "sleep() ignores time spent with a suspended system"
    http://mjg59.dreamwidth.org/7846.html

    You should also decide how to handle interrupted sleep (C error "EINTR"). Currently, the sleep is interrupted, no error is raised.

    I began to describe all these functions in the PEP-418, even if I didn't change the implementation with the PEP:
    http://legacy.python.org/dev/peps/pep-0418/#sleep

    If you want to modify time.sleep(), you must be careful of the portability: Windows, Linux, but also Mac OS X, FreeBSD, Solaris, etc.

    Oh, I totally agree. What I'm trying to do is to define another autoconf flag (HAVE_CLOCK_NANOSLEEP), that does a feature test and enable that flag, and just use that if available.

    Now that's a good point that if we have clock_nanosleep() on another platform (non-Linux) and it does the wrong thing, then I might have to add further discrimination.

    For now, one sticking point that I've stumbled across is that clock_nanosleep() requires "-lrt". Complicates the autoconf check a bit.

    2014-04-27 2:26 GMT+02:00 Shankar Unni <report@bugs.python.org>:

    > If you want to modify time.sleep(), you must be careful of the portability: Windows, Linux, but also Mac OS X, FreeBSD, Solaris, etc.

    Oh, I totally agree. What I'm trying to do is to define another autoconf flag (HAVE_CLOCK_NANOSLEEP), that does a feature test and enable that flag, and just use that if available.

    I'm talking about the expected behaviour which can be found in the
    documentation of the function:
    https://docs.python.org/dev/library/time.html#time.sleep

    Can you review my final implementation?
    #28111

    New changeset 85a4748 by Livius in branch 'main':
    bpo-21302: Add clock_nanosleep() implementation for time.sleep() (GH-28111)
    85a4748

    New changeset 85dc53a by Victor Stinner in branch 'main':
    bpo-21302: Update time.sleep() doc for clock_nanosleep() (GH-28311)
    85dc53a

    New changeset b49263b by Victor Stinner in branch 'main':
    bpo-21302: Add _PyTime_AsNanoseconds() (GH-28350)
    b49263b

    wait.py: script to test the time.sleep() function. Press CTRL+C multiple times during the sleep to check that even if the sleep is interrupted, time.sleep() sleeps the expected duration.

    New changeset 58f8adf by Victor Stinner in branch 'main':
    bpo-21302: time.sleep() uses waitable timer on Windows (GH-28483)
    58f8adf

    Livius: your first PR modified Sleep() in Modules/_tkinter.c to use nanosleep(). I don't see the point since this function has a solution of 1 ms (10^-3). Using select() on Unix is enough: resolution of 1 us (10^-6).

    bench.py: measure the shortest possible sleep. Use time.sleep(1e-10): 0.1 nanosecond. It should be rounded to the resolution of the used sleep function, like 1 ns on Linux.

    On Linux with Fedora 34 Python 3.10 executable, I get:

    Mean +- std dev: 60.5 us +- 12.9 us (80783 values)
    

    On Windows with a Python 3.11 debug build, I get:

    Mean +- std dev: 21.9 ms +- 7.8 ms (228 values)
    

    Sadly, it seems like on Windows 10, one of the following function still uses the infamous 15.6 ms resolution:

    • CreateWaitableTimerW()
    • SetWaitableTimer()
    • WaitForMultipleObjects()

    On Windows with a Python 3.11 debug build, I get:
    Mean +- std dev: 21.9 ms +- 7.8 ms (228 values)

    I wrote an optimization to cache the Windows timer handle between time.sleep() calls (don't close it). I don't think that it's needed because they shortest sleep is about 15.6 ms. CreateWaitableTimerW() is likely way more fast than 15.6 ms. So this optimization is basically useless.

    Livius: do you care about using nanosleep(), or can I close the issue?

    See also bpo-19007: "precise time.time() under Windows 8: use GetSystemTimePreciseAsFileTime".

    New changeset 7834ff2 by Victor Stinner in branch 'main':
    bpo-21302: Add nanosleep() implementation for time.sleep() in Unix (GH-28545)
    7834ff2

    Thanks Livius for all these nice enhancements!

    Do you have any information about when will be it released in 3.11?

    Do you have any information about when will be it released in 3.11?

    Here is a schedule of Python 3.11 releases:
    https://www.python.org/dev/peps/pep-0664/

    In the meanwhile, you can develop a C extension to get the feature.

    https://www.python.org/downloads/windows/
    "Note that Python 3.10.0 cannot be used on Windows 7 or earlier."

    vstinner: Is it true that Windows 7 is not supported OS anymore? In this case we do not need to care about Windows 7 and earlier Windows OS compatibility and it is time to use nicely GetSystemTimePreciseAsFileTime() in time.time() and time.sleep() as absolute sleeping because it is available since Windows 8.

    it is time to use nicely GetSystemTimePreciseAsFileTime() in time.time()

    See bpo-19007 for that.

    it is time to (...) time.sleep() as absolute sleeping because it is available since Windows 8.

    In Python 3.11, time.sleep() is now always implemented with a waitable timer. I chose to use a relative timeout since it's simpler to implement. Is there any benefit of calling SetWaitableTimer() with an absolute timeout, compared to calling it with a relative timeout?

    In Python 3.11, time.sleep() is now always implemented with a
    waitable timer.

    A regular waitable timer in Windows becomes signaled with the same resolution as Sleep(). It's based on the current interrupt timer period, which can be lowered to 1 ms via timeBeginPeriod(). Compared to Sleep() it's more flexible in terms of periodic waits, WaitForMultipleObjects(), or MsgWaitForMultipleObjects() -- not that time.sleep() needs this flexibility.

    That said, using a waitable timer leaves the door open for improvement in future versions of Python. In particular, it's possible to get higher resolution in newer versions of Windows 10 and Windows 11 with CreateWaitableTimerExW() and the undocumented flag CREATE_WAITABLE_TIMER_HIGH_RESOLUTION (2).

    Absolute timeout implementation via SetWaitableTimer() and GetSystemTimePreciseAsFileTime() is always better because it can reduce the "waste time" or "overhead time" what is always exist in any simple interval sleep implementation. Moreover, it is the only one which is eqvivalent with clock_nanosleep() implementation of Linux which is now the most state of the art implementation for precise sleeping.

    So, my opinion is that absolute timeout implementation could be the most modern and sustainable for future python.

    Is there any benefit of calling SetWaitableTimer() with an
    absolute timeout

    No, the due time of a timer object is stored in absolute interrupt time, not absolute system time. This has to be calculated either way, and it's actually more work for the kernel if an absolute system time is passed.

    It is not true that there are no benefits. Absolute timeout using can reduce the overhead time of any variable and object intialization cost before the WaitForMultipleObjects() which will perform the real sleeping via blocking wait in pysleep(). GetSystemTimePreciseAsFileTime() must be call at the first line as much as it can in pysleep(). This is the same implementation in Linux via clock_nanosleep().

    So, to using absolute timeout and GetSystemTimePreciseAsFileTime() can improves the accuracy of the desired sleep time. For example if sleep = 2.0 sec then real relative sleep time = 2.001234 sec, but absolute sleep time = 2.000012 sec.

    Benefits are in not the technicaly backgorund, rather it is in the usecase.

    In other words, using absolute timeout can eliminate the systematic error of desired sleep time.

    Absolute timeout using can reduce the overhead time of any variable
    and object intialization cost before the WaitForMultipleObjects()

    Again, timer objects store the due time in interrupt time, not system time (i.e. InterruptTime vs SystemTime in the KUSER_SHARED_DATA record). The due time gets set as the current interrupt time plus a relative due time. If the due time is passed as absolute system time, the kernel just computes the delta from the current system time.

    The timer object does record whether the requested due time is an absolute system time. This allows the kernel to recompute all absolute due times when the system time is changed manually. This is also the primary reason one wouldn't implement time.sleep() with absolute system time.

    using absolute timeout and GetSystemTimePreciseAsFileTime() can
    improves the accuracy of the desired sleep time.

    It would not improve the resolution. Timer objects are signaled when their due time is at or before the current interrupt time. The latter gets updated by the timer interrupt service routine, by default every 15.625 ms -- or at least that used to be the case.

    The undocumented flag CREATE_WAITABLE_TIMER_HIGH_RESOLUTION creates a different timer type, called an "IRTimer" (implemented in Windows 8.1, but back then only accessible in the NT API). This timer type is based on precise interrupt time, which is interpolated using the performance counter. I don't know how the implementation of the timer interrupt has changed to support this increased resolution. It could be that the default 15.625 ms interrupt period is being simulated for compatibility with classic timers and Sleep(). I'd love for the CREATE_WAITABLE_TIMER_HIGH_RESOLUTION flag and the behavior of IRTimer objects to be documented.

    That said, using a waitable timer leaves the door open for improvement in future versions of Python. In particular, it's possible to get higher resolution in newer versions of Windows 10 and Windows 11 with CreateWaitableTimerExW() and the undocumented flag CREATE_WAITABLE_TIMER_HIGH_RESOLUTION (2).

    I created bpo-45429 "[Windows] time.sleep() should use CREATE_WAITABLE_TIMER_HIGH_RESOLUTION".

    This issue is closed. If you consider that time.sleep() has a bug or could be enhanced, please open a new issue.

    Just for some high-lights, i did some test-run with bench.py:

    Linux - i.MX7D ARM

    Python 3.8
    root@myARM:~# python3 bench.py
    time.sleep(0.001) benchmark...
    Mean +- std dev: 1.1 ms +- 22.5 us (26525 values)
    
    root@myARM:~# python3 bench.py
    time.sleep(0.0001) benchmark...
    Mean +- std dev: 227.1 us +- 18.6 us (125174 values)
    Python 3.11
    root@myARM:~# python3 bench.py
    time.sleep(0.001) benchmark...
    Mean +- std dev: 1.1 ms +- 6.4 us (27686 values)
    
    root@myARM:~# python3 bench.py
    time.sleep(0.0001) benchmark...
    Mean +- std dev: 170.2 us +- 5.7 us (164893 values)

    Windows 10 - i7-10850H PC

    Python 3.10
    time.sleep(0.001) benchmark...
    Mean +- std dev: 15.9 ms +- 350.9 us (1889 values)
    
    time.sleep(0.0001) benchmark...
    Mean +- std dev: 15.9 ms +- 341.1 us (1888 values)
    Python 3.11
    time.sleep(0.001) benchmark...
    Mean +- std dev: 1.6 ms +- 339.6 us (19180 values)
    
    time.sleep(0.0001) benchmark...
    Mean +- std dev: 574.6 us +- 95.2 us (51837 values)

    While I'm assuming this has to do with something we'll discover on our internal production platforms at work... is anyone else seeing time.sleep hang when you've compiled your 3.11 runtime such that it uses clock_nanosleep() instead of the old original select() code?

    While I'm assuming this has to do with something we'll discover on our internal production platforms at work... is anyone else seeing time.sleep hang when you've compiled your 3.11 runtime such that it uses clock_nanosleep() instead of the old original select() code?

    Python calls clock_nanosleep() on clock CLOCK_MONOTONIC. It computes the absolute time using _PyTime_GetMonotonicClockWithInfo(). On Linux, _PyTime_GetMonotonicClockWithInfo() should be clock_gettime(CLOCK_MONOTONIC). It should be the same clock.

    Maybe check that both code paths use the same clock?

    we found a broken CLOCK_MONOTONIC in some environments. we've worked around it until that gets fixed.