ruby-concurrency/concurrent-ruby

Synchronization layer

pitr-ch opened this issue · 12 comments

Hello,

I would like to build a common synchronization layer in concurrent-ruby having following API:

def Concurrent.lock(object); end #reentrant
def Concurrent.try_lock(object); end
def Concurrent.unlock(object); end

def Concurrent.synchronize(object, &block); end

def Concurrent.wait(object, timeout = nil); end # as Java Object#wait
def Concurrent.signal(object); end # as Java Object#notify
def Concurrent.broadcast(object); end # as Java Object#notifyAll

Which would have different implementation based on Ruby interpreter.

I kindly ask @brixen @headius @hone @ko1 @yorickpeterse @zzak to help with finding the best options for the different implementations. This PR is intended to serve to discus the options and issues, I'll open PR with implementation later to discuss details based on what is figured out here.

Proposed implementations

MRI

Injecting Monitor to an object, something like:

    require 'monitor'
    INJECTION_BARRIER = Mutex.new

    def synchronize(object)
      get_mutex(object).synchronize { yield }
    end

    def lock(object)
      get_mutex(object).enter
    end

    def unlock(object)
      get_mutex(object).exit
    end

    def get_mutex(object)
      object.instance_variable_get(:@__monitor__) or INJECTION_BARRIER.synchronize do
        object.instance_variable_get(:@__monitor__) or
            object.instance_variable_set(:@__monitor__, Monitor.new)
      end
    end

Similarly a ConditionVariable would be injected to support wait, signal, broadcast.

JRuby

Delegate to methods on JRuby module, see this PR jruby/jruby#2594.

It uses wait, notify, notifyAll methods on Java object to implement wait, signal, broadcast. synchronize runs a Block wrapped in synchronized Java keyword. lock, unlock, try_lock are using methods monitorEnter, monitorExit, tryMonitorEnter on sun.misc.Unsafe.

Rubinius

Implement lock, unlock, try_lock and synchronize by delegation to same methods on Rubinius module. wait, signal, broadcast would be supported by injecting ConditionalVariable into object's instance variable (when locked).

lock, unlock, try_lock are not really needed for concurrent-ruby so far, we will be fine with synchronize so they can be omitted if it turns out to be a bad idea to use sun.misc.Unsafe for its implementation on JRuby.

Rubinius currently provides the following methods for this:

The last two are mainly useful for thread scheduling/management and not so much for synchronization of arbitrary objects.

Some examples of these synchronization methods can be found in the Mutex class: https://github.com/rubinius/rubinius/blob/54544961d044a93f5b67dd87fb43c4b923bcc811/kernel/common/mutex.rb

Generally I'd say it's better to use implementation specific APIs here versus injecting a Mutex/Monitor. Given you inject objects you'll need a name for the instance variable, and that's bound to conflict with something sooner or later unless you'd call it something like @__concurrentruby_really_this_is_our_private_mixin_instance_variable_dont_touch_it_. Also, the injecting of objects might have to be synchronized itself, I'd have to look if we have any barriers in place in Rubinius to mitigate that need.

@chrisseaton gave me a good question on concurrent-ruby chat: "What is the motivation?" Sorry for omitting in the first place, let me fix that.

There are essentially 2 reasons for this: First is performance, allowing the best option to be used on any given Ruby implementation.

Second is following code which is not guaranteed to be safe on JRuby and Rubinius, AFAIK.

def initialize
  @mutex = Mutex.new
end
def do_something
  @mutex.synchronize { something } 
end      

There are no locks around method calls and instance variable access is not volatile, so if the object is initialized one thread and do_something called on another there is no guarantee that the @mutex will be visible. If you look into JRuby source code the instance variable access actually has volatile behavior [1] but the documentation says otherwise, meaning the implementation may change. Regarding Rubinius I am little more in the dark, based on my googling and code digging it looks the problem applies too.

This can be solved by using Rubinius.synchronize or JRuby.reference0(object).synchronized to synchronize around the object without need to access mutex in instance variable. Or with the layer in place by def do_something; Concurrent.synchronize(self) { something }; end.

[1] https://github.com/jruby/jruby/blob/master/core/src/main/java/org/jruby/runtime/ivars/StampedVariableAccessor.java https://github.com/jruby/jruby/blob/master/core/src/main/java/org/jruby/runtime/ivars/SynchronizedVariableAccessor.java

The code example @pitr-ch shows, with a mutex being created in the constructor, is something we do all the time in this gem. I'll be honest that these low-level implementation details aren't my forte. My position is that if the aforementioned code sample cannot provide guaranteed behavior across the three major interpreters, then we should do something about it. The common synchronization layer being proposed makes sense to me, but I trust the judgement of @pitr-ch and @chrisseaton.

I'm fairly sure that the above snippet is thread-safe at least on Rubinius as creating of instance variables is not deferred to the background (or otherwise done in a non blocking manner). Also, since the variable is set up in the initialize method it should be fully set up before any thread can even use it.

The following case however could be problematic depending on the implementation:

def do_something
  mutex.synchronize { something }
end

def mutex
  @mutex ||= Mutex.new # Unless ||= is atomic N threads could attempt this at the same time
end

@yorickpeterse what you are saying makes sense with sequential consistency, but unless there is some kind of synchronisation or memory barrier, what makes you think that the write to @mutex will always be observed before the write to whatever variable or other location stores the object that contains the mutex, from the perspective of another thread? You may have written to A and then B, but another thread might see the write to B, but not A yet.

@yorickpeterse Thanks for your response!

I agree with you that it's better to use the API and I am suggesting the same for lock, unlock, try_lock and synchronize. But I could not find any methods on Rubinius which would directly support wait, signal, broadcast. ConditionVariable implementation on Rubinius uses Rubinius::Channel.

I like the variable name :) yeah it's not a perfect solution to inject into the objects. I am looking for a better one.

@pitr-ch Rubinius does not use any memory barrier for instance variables so they do not have volatile (Java definition) semantics.

We would not want to see ruby-concurrency injecting anything into objects on Rubinius. I can add the wait, signal, broadcast API that you listed as it's quite consistent with the existing API we have with eg Rubinius.lock(object), etc.

I agree injecting is not a good solution. Thanks @brixen for the offer to extend the Rubinius API! That will be great!

Addressed in #287.

Just to add more context for possible readers.

There was a Concurrent::Synchronization::Object created which provides methods synchronize, wait, signal, broadcast and other utilities to create volatile and final fields. concurrent-ruby classes then usually inherit from this class. Concurrent::Synchronization::Object has different implementations based on the particular behavior of a given Ruby implementation (or version). (This approach avoids any ugly ivar injections, as discussed in this issue.)

I'll finish better documentation for the Synchronization::Object first so it's more understandable and does not force anybody to dig through source code. Then I plan to contact Ruby implementors to help verify correctness of this layer. After that it'll be released in 0.9 and stabilized for 1.0 release. Tracked in #292.

👍