ruby-concurrency/concurrent-ruby

should the thread_local_var_accessor gem be included within the concurrent-ruby gem?

Opened this issue · 4 comments

aks commented

I've written a little gem that makes concurrent-ruby's thread-local variables easy to use, in the same manner of attr_reader, attr_writer, attr_accessor: it is called: thread_local_var_accessors

See:

Eg:

tlv_reader :var1
tlv_writer :var2
tlv_accessor: :var3

I've been using it in my company software for some time now. It is an easy way to encourage developers to improve the thread-safetyness of their code, especially code that runs in a multi-threaded environment (eg: sidekiq tasks).

I'm happy to continue maintaining it separately, but I'm also happy to contribute it to the core concurrent-ruby gem, if that's appropriate. In the latter case, since we are already using concurrent-ruby elsewhere, for other reasons, it would mean one gem less to track for us. 😄

eregon commented

Thanks for writing up the proposal.
I don't know how often this is useful, for example if it's a ThreadLocalVar which can be stored e.g. in a constant/globally there is no need for this.

I took a quick look at the code in https://github.com/aks/thread_local_var_accessors.
One thing which does not seem thread-safe is https://github.com/aks/thread_local_var_accessors/blob/d5d4fd3418d32c1901f719da603f652b7f1cea17/lib/thread_local_var_accessors.rb#L280-L285 specifically for:

class Foo
  tlv_accessor: :tvar
  def initialize
  end

  def foo
    self.tvar = true
    p tvar
end

foo = Foo.new
8.times.map { Thread.new { foo.foo } }

Some threads might print nil instead of true because multiple threads might assign @tvar and so overwrite each other.

eregon commented

Given this seems mostly convenience on top of ThreadLocalVar to be a bit shorter (e.g. tvar/self.tvar = v vs @tvar.value/@tvar.value = v), I think it's best as its own gem.

And the fact the ThreadLocalVar is created more magically/behind-the-scenes creates the race mentioned above.
Although that could be solved e.g. by requiring to assign the ivar in initialize, and raising an exception on read/write if it's not the case. (or using a global or per-class lock but that would create lots of contention so not a good solution)

aks commented

I corrected your test example above, and ran it with thousands of threads. At no point was there ever any error.

However, I'm fine keeping the library in a separate gem.
Thanks for your reply and review.

Here's the updated test program and a sample output:

% cat tlvtest.rb
#!/usr/bin/env ruby
require 'thread_local_var_accessors'

class Foo
  include ThreadLocalVarAccessors

  tlv_accessor :tvar

  def initialize
  end

  def foo(num)
    self.tvar = num
    tvar
  end
end

obj = Foo.new

count = ARGV.shift&.to_i || 100

puts "Count: #{count}"

data_in = count.times.to_a
data_out = data_in.map { |num| Thread.new { obj.foo(num) }.run.value }

diffs = data_in - data_out
if diffs.size.zero?
  puts "no diffs"
else
  puts "Diffs: #{diffs}"
end

And, here's the output:

% $ time ./tlvtest.rb 100000
Count: 100000
no diffs
./tlvtest.rb 100000  2.11s user 1.47s system 113% cpu 3.145 total

Apologies for reviving a thread (and for a long post)

@aks Your test program uses Thread#run which joins the thread implicitly, making code in essence sequential. It needs to be something like this:

data_in = count.times.to_a
threads = data_in.map { |num| Thread.new { obj.foo(num) } }
loop { break if threads.none?(&:alive?) }
data_out = threads.map(&:value)

The code does seem to be thread-safe:

$ ruby tlvtest.rb 10000
Count: 10000
no diffs

However, this is still not a proper test. Short-lived threads are pretty much guaranteed to run to completion without exhibiting races (at least on CRuby). For example, this modified test "proves" that instance variables are thread-safe:

#!/usr/bin/env ruby

require "thread_local_var_accessors"

class Foo
  include ThreadLocalVarAccessors

  tlv_accessor :tvar

  def foo(num)
    self.tvar = num
    tvar
  end
end

class Bar
  def foo(num)
    @var = num
    @var
  end
end

count = ARGV.shift&.to_i || 100
puts "Count: #{count}"
$data_in = count.times.to_a

def run_test(name, klass)
  puts "#{name}:"
  obj = klass.new
  threads = $data_in.map { |num| Thread.new { obj.foo(num) } }
  loop { break if threads.none?(&:alive?) }
  data_out = threads.map(&:value)

  diffs = $data_in - data_out
  if diffs.empty?
    puts "no diffs"
  else
    puts "Diffs: #{diffs}"
  end
end

run_test("Foo (TVar)", Foo)
run_test("Bar (Instance Var)", Bar)

Output:

$ ruby tlvtest.rb 100000
Count: 100000
Foo (TVar):
no diffs
Bar (Instance Var):
no diffs

If the #foo method is modified to explicitly allow passing execution to a different thread, suddenly we get problems:

class Foo
  include ThreadLocalVarAccessors

  tlv_accessor :tvar

  def foo(num)
    self.tvar = num
    Thread.pass
    tvar
  end
end

class Bar
  def foo(num)
    @var = num
    Thread.pass
    @var
  end
end

Output:

ruby tlvtest.rb 10   
Count: 10
Foo (TVar):
no diffs
Bar (Instance Var):
Diffs: [0, 1, 2, 3, 4, 5, 6, 9]

Be aware however that this doesn't really prove or disprove thread-safety of your code, you would need to test it with modified inner workings, by making it possible to switch threads in problematic spots. I, personally, think that it should be safe, but I'm no expert.