python/asyncio

Problem with lock acquire non-blocking optimization

u201701 opened this issue · 4 comments

(PS: I earlier tried creating an issue on the Python issue tracker but I wasn't quite sure of the exact problem, and I am still coming up to speed on asyncio. So my original report was dismissed twice, but not entirely on merits, but just because of communication gaps.)

Here is an attempt to explain it correctly:

Motivating Example

import asyncio

lock = asyncio.Lock()

def a ():  # logic is identical to b()
 yield from lock.acquire()
 for i in range(10):
  print('a: ' + str(i))
  if i % 2 == 0:
   lock.release()
   yield from lock.acquire()
 lock.release()

def b ():  # logic is identical to a()
 yield from lock.acquire()
 for i in range(10):
  print('b: ' + str(i))
  if i % 2 == 0:
   lock.release()
   yield from lock.acquire()
 lock.release()

asyncio.get_event_loop().run_until_complete(asyncio.gather(a(), b()))

print('done')

The expectation is that a() and b() will run concurrently. However, the way lock is implemented, one will run to completion before the other even starts.

May be this example is contrived, but there are other situations where you want to run cooperatively while using a lock around segments which may or may not block for safety of shared data.

Explanation

The reason one runs to completion before the other even starts is because of the following 3 lines of code in asyncio.lock.acquire:

if not self._locked and all(w.cancelled() for w in self._waiters):
    self._locked = True
    return True

I. e. when no other coroutine is waiting on the lock then yield from lock.acquire() will not yield control back to scheduler.

The problem is that the other coroutine does not even have a chance to register as a waiter because of this optimization.

Inefficiency of Workarounds

One could add a yield from asyncio.sleep(0) before the acquire or after the release. However in the common case when some coroutine has had a chance to register as a waiter, this will cause two yields instead of one. I. e. one extra, unnecessary return to the scheduler.

No Way to Predict Fast Path or Not

If asyncio.lock had a method that would let the caller know whether the lock would be acquired in the fast path or not then the caller could use this for an efficient workaround.

But, there isn't any such method or way to know this in advance.

Inconsistency with Documentation

Not entirely inconsistent, but the docstring of lock.acquire is at best ambiguous: "This method blocks until the lock is unlocked, then sets it to locked and returns True." It gives the impression that the method blocks although a precise reading indicates that it does not tell what happens if the lock is already unlocked.

This point is to suggest fixing the fundamental problem in the spirit of the documentation rather than tweak the documentation to document the current behavior.

Minimal Suggested Fix

At least allow the caller to know, after the fact, whether acquire() yielded to the scheduler or not. This could be done by setting a flag which could be interrogated in a new, separate call. Or, have acquire() return a tuple that includes an additional boolean that tells whether control was yielded or not.

Alternate Suggested Fixes

  1. Allow a caller to interrogate in advance whether or not acquire() will yield control or not. This amounts to an additional method that tells you how many non-cancelled waiters exist. This is not ideal because it requires counting waiters twice: once in advance and once when acquire() is called.

  2. Always yield control and do not optimize - a reasonable choice because locks in co-operative multi-tasking will typically have other coroutines waiting (otherwise you can get away without a lock itself).

Ideal Fix

My ideal fix is 2. above -- just do not optimize this and always yield control to the scheduler. Cooperative multitasking locks are qualitatively different from multi-threading locks and as far as I can see multitasking locks will typically have contention and therefore no need to optimize the fast path.

Thanks!

Hi,

Thanks for the detailed explanation! I understand the issue, but can't see why it needs to be fixed: if you use a lock in a coroutine, it's probably because you need to synchronize IO operations. They will be invoked with await/yield from and most likely yield to the scheduler.
In other words: in which "real-world" case do you think that an asyncio.Lock will be used without yielding?

Rather than yield from asyncio.sleep(0), you can use loop.call_soon(lock.release) to ensure that the lock will be released during the next loop iteration, which effectively makes yield from lock.acquire() to yield to the scheduler.

Thanks for taking the time to fully understand the issue. Typically application software fixes give significant weight to real-life use cases, whereas systems software such as libraries consider even remote cases. That's because there is a HUGE diversity of python users and if there is an issue, somebody or the other will encounter it. In other words, libraries such has this have to be a bit aspirational and strive to perfection.

That said, I dug deeper into this issue and found 2 things that make it moot.

  1. Numerous other asyncio coroutines do not block in the fast path, including those used for mutual exclusions. Two random examples: Event.wait() and Queue.get().
    Thus, firstly, fixing just Lock is of little consequence -- users must still be cautious about many other coroutines.
    Secondly, I am not sure whether it is wise for all co-routines or even just mutual exclusion ones to be "fixed" so that they always yield to the scheduler, even in the fast path. In fact, I lean towards preserving the current fast paths.
    Thirdly, I am inclined to believe that there should always be a way to tell whether the coroutine will yield to scheduler or not.

  2. There is indeed a method Lock.locked() that can predict whether Lock.acquire() will yield to scheduler or not, and Lock.locked() itself does not yield to the scheduler.
    This method can be used by minority users who may care about this condition. Hopefully other similar coroutines with a fast path also have similar predictor methods.

  3. This distantly relates to an earlier debate in #284 (not involving me) regarding the behavior of asyncio.sleep(0). The call to sleep(t) was actually de-optimized (in the context of this issue) to yield to scheduler even though it could just return without yielding when t == 0. This was done to avoid adding a new method to the library.
    Given the above discussion, it seems like a bad idea to have done so. I hope in light of the discussion in this issue, somebody decides to go with the original thought to add async.nop rather than overload sleep in an unusual way.

  4. Finally, I just noticed that the atypical behavior of asyncio.sleep is not even documented in the docstring. Hopefully the library maintainers at least fix this oversight.

Accordingly, I withdraw this issue. I do want to bring @1st1 's attention to this, especially regarding asyncio.nop(). Please close this issue at your convenience.

Closing this. It was an interesting discussion! I'll add, as I close it, that I made similar decisions implementing async.Queue. At first I wanted any statement in a coroutine like yield from q.get() or yield from q.put(data) to definitely yield to the scheduler and unblock any other waiting coroutine. Guido and Nikolay convinced me that it was better for q.get() and q.put() to take a fast path if possible and not unblock all waiting coroutines. In real life, coroutines do I/O and yield to the scheduler while they wait for the I/O to complete, and that naturally causes the sort of interleaved execution you expected in your code example.

Thanks for adding your explanation @ajdavis

I agree that all is fine given the presence of fast path functions such as Lock.locked() that can predict whether another coroutine will yield or take the fast path.

The anomaly of asyncio.sleep(0) does remain and I hope some day a asyncio.nop() is added or at least the behavior of sleep(0) is documented.

Thanks!