rtomayko/posix-spawn

POSIX::Spawn.` slower than Kernel.` on MRI 2.2

Opened this issue · 3 comments

I noticed that posix-spawn's implementation of the back-tick operator is significantly slower than the built-in implementation of MRI ruby 2.2.2 on both Linux and Mac OS X:

require "benchmark"
require "posix/spawn"
n = 10_000
cmd = "/bin/echo".freeze
Benchmark.bm do |x|
  x.report("Kernel.\`")       { n.times { Kernel.`       cmd } }
  x.report("POSIX::Spawn.\`") { n.times { POSIX::Spawn.` cmd } }
end

Result on Mac OS X 10.11 Beta 6, Darwin 15.0.0 x86_64, ruby 2.2.2p133, posix-spawn 0.3.11:

                    user     system      total        real
Kernel.`        0.800000   5.470000  42.220000 ( 42.443894)
POSIX::Spawn.` 14.350000   2.440000 100.060000 (102.485969)

Result on Ubuntu Server 14.04.3 LTS, Linux 3.13.0 x86_64, ruby 2.2.2p95, posix-spawn 0.3.11:

                    user     system      total        real
Kernel.`        1.080000   0.810000  18.140000 ( 17.168791)
POSIX::Spawn.`  1.440000   1.030000  34.150000 ( 32.650234)

I tried to run the above benchmark script from rails c and irb on ubuntu & ruby 2.4

#rails c
                    user     system      total        real
Kernel.`        0.660000  12.920000  13.670000 ( 34.030676)
POSIX::Spawn.`  1.060000   1.220000   2.280000 ( 16.301847)
#irb
                    user     system      total        real
Kernel.`        0.410000   2.460000   2.870000 ( 10.719521)
POSIX::Spawn.`  0.740000   0.850000   1.590000 ( 11.552479)

Looks like posix-spawn performs significantly better when the parent process is memory huge e.g., rails worker.

jjb commented

Thank you @felixbuenemann for starting this discussion and providing your methodology and results.

I think the raw nanosecond overhead doesn't detract from the value of posix-spawn from any use case I can think of. @mintuhouse's benchmark is a great example, and here's another take on an extremely simplistic "more realistic" benchmark, this time running a process that isn't a pure NOOP like echo but actually takes time to do something.

require "benchmark"
require "posix/spawn"
n = 500
cmd = "sleep .01"
Benchmark.bm do |x|
  GC.start
  x.report("Kernel.\`")       { n.times { Kernel.`       cmd } }
  GC.start
  x.report("POSIX::Spawn.\`") { n.times { POSIX::Spawn.` cmd } }
end
$ ruby posix-spawn-benchmark.rb
       user     system      total        real
Kernel.`               0.060000   0.040000   0.100000 (  5.747498)
POSIX::Spawn.`  0.044000   0.056000   0.100000 (  6.134681)
jjb commented

I actually tried to make a version of @felixbuenemann benchmark which had a parent process with a lot of memory

n = 10_000
cmd = "/bin/echo".freeze
string = ""
1_000_000_00.times{ string << "hello"}

But no matter how big I made it I couldn't get posix-spawn to be faster. posix spawn was always 2x the time.

Then I made this benchmark to test memory used because of forking.

require "benchmark"
require "posix/spawn"
n = 10_000
cmd = "free -m"
string = ""
1_000_000_00.times{ string << "hello"}

puts "Sleeping for 10 seconds. Please run free -m directly in another shell"
sleep 10

GC.start
puts "Kernel.\`"
puts Kernel.` cmd                                                                  
                                                                                   
GC.start                                                                           
puts "POSIX::Spawn.\`"                                                             
puts POSIX::Spawn.` cmd
$ free -m
              total        used        free      shared  buff/cache   available
Mem:           3951         556        2687          21         706        3087
Swap:             0           0           0
$ ruby posix-spawn-benchmark.rb
Sleeping for 10 seconds. Please run free -m directly in another shell
Kernel.`
              total        used        free      shared  buff/cache   available
Mem:           3951         556        2687          21         706        3087
Swap:             0           0           0
POSIX::Spawn.`
              total        used        free      shared  buff/cache   available
Mem:           3951         556        2687          21         706        3087
Swap:             0           0           0

So this either means that copy-on-write is really doing its job and creating zero memory overhead, or the ruby backtick implementation changed at some point.

$ cat /etc/issue
Ubuntu 16.04.5 LTS \n \l

$ ruby --version
ruby 2.5.3p105 (2018-10-18 revision 65156) [x86_64-linux-gnu]