/pure

Automatic parallelism and lazy evaluation using pure functional programming in Ruby.

Primary LanguageRuby

Pure

Summary

Language-level support for automatic parallelism and lazy evaluation.

Synopsis

require 'pure'

geometry = Pure.define do
  def area(width, height)
    width*height
  end

  def width(border)
    7 + border
  end

  def height(border)
    5 + border
  end

  def border
    2
  end
end

# Compute the area using 3 parallel threads.
puts geometry.compute(3).area
# => 63

# We've done this computation.
puts((7 + 2)*(5 + 2))
# => 63

Install

% gem install pure

Or for the (non-gem) .tgz package,

% ruby install.rb [--uninstall]

Description

Pure imports aspects of the pure functional paradigm into Ruby.

Method and argument names have literal meaning within a Pure.define block. In the above example, the width argument to area corresponds, by its literal name, to the width method.

Pure does not modify any of the standard classes.

Pure has been tested on MRI 1.8.6, 1.8.7, 1.9.1, 1.9.2, and jruby-1.4.

Terminology

Pure.define returns a Module instance, called a pure module. The methods of a pure module are called pure functions.

DSL

The pseudo-keyword pure, an alias of Pure.define, is included in the global scope with require 'pure/dsl'. It is also made available by including Pure::DSL into a class or module.

Overrides

An options hash given to compute is used for overriding functions in the pure module. If the name of a function matches a hash key, the function will be replaced with the corresponding hash value.

require 'pure/dsl'

greet = pure do
  def hello(name)
    "Hello, #{name}."
  end

  def name
    "Bob"
  end
end

puts greet.compute(9).hello
# => Hello, Bob.

puts greet.compute(9, :name => "Ralph").hello
# => Hello, Ralph.

puts greet.compute(9, :hello => "Good evening.").hello
# => Good evening.

Default Number of Functions in Parallel

The num_parallel argument to compute() may be omitted, in which case the num_parallel determination falls to Pure.worker, an object described later in this document.

require 'pure/dsl'

Pure.worker.num_parallel = 2

result = pure do
  def f(x, y)
    x + y
  end

  def x
    33
  end
end.compute(:y => 44)

# compute with 2 parallel threads
puts result.f  # => 77

Delegates and Blocks

When no block is given to compute, it returns a delegate for the computation. The computation results are stored in the lazily-evaluated attributes of the delegate. A computation is performed only when an attribute is requested, and an attribute is never recomputed.

When compute is given a block, the delegate is passed to the block and the return value of compute is the result of the block.

require 'pure/dsl'

geometry = pure do
  def area(width, height)
    width*height
  end

  def width(border)
    7 + border
  end

  def height(border)
    5 + border
  end
end

area = geometry.compute :border => 2 do |result|
  puts result.border     # => 2
  puts result[:border]   # => 2

  puts result.width      # => 9
  puts result.height     # => 7

  result.area
end

puts area  # => 63

Function results may also be accessed with [], as shown in result[:border] above.

Combining Pure Modules

Pure modules are regular Module instances. They may be combined freely with include.

require 'pure/dsl'

greet = pure do
  def hello(name)
    "Hello, #{name}."
  end
end

ralph = pure do
  include greet
  def name
    "Ralph"
  end
end

puts ralph.compute.hello
# => Hello, Ralph.

Dynamic Names

The pseudo-keyword fun is provided for defining a pure function whose name or arguments are unknown at compile time.

require 'pure/dsl'

geometry = pure do
  fun :area => [:width, :height] do |w, h|
    w*h
  end

  def width
    4
  end

  fun :height do
    5
  end
end

puts geometry.compute.area  # => 20

Or more realistically,

require 'pure/dsl'

file_stats = pure do
  files = Dir["*"]

  files.each { |file|
    fun file do
      File.stat(fun_name.to_s)
    end
  }

  fun :total_size => files do |*stats|
    stats.inject(0) { |acc, stat| acc + stat.size }
  end
end

file_stats.compute { |result|
  puts result["Rakefile"].size  # => 505
  puts result.total_size        # => 39355
}

The left of side => is the function name. The right side is an array containing the names of the function arguments. The values of the function arguments are passed to the block.

The next section explains the fun_name call in this example.

Referencing Function and Argument Names

Inside a pure function, fun_name gives the name of the function and arg_names gives the names of its arguments. In the previous example above,

files.each { |file|
  fun file do
    File.stat(fun_name.to_s)
  end
}

Here, fun_name.to_s is exactly the same as file. So why not call File.stat(file)? Pure functions are extracted from their surrounding context and must therefore use function arguments as the sole means of communication. In this case File.stat(file) references file which lies outside the function definition.

The above is strictly not necessary when the default worker (explained later) is used, however the best strategy is to ignore this detail.

Mapping an Enumerable in Parallel

The convenience method fun_map defines an anonymous pure function which is applied to each element of a given enumerable.

require 'pure/dsl'

numbers = pure do
  fun_map :squares => [3, 4, 5] do |n|
    n*n
  end
end

p numbers.compute.squares  # => [9, 16, 25]

The example from the “Dynamic Names” section is more easily written with fun_map,

require 'pure/dsl'

file_stats = pure do
  fun_map :stats => Dir["*"] do |file|
    File.stat(file)
  end

  def total_size(stats)
    stats.inject(0) { |acc, stat| acc + stat.size }
  end
end

puts file_stats.compute(3).total_size  # => 39355

Restrictions

Since the grand scheme of Pure rests upon all functions and function arguments having a name, a pure function defined with def cannot have a *splat argument. Naturally this restriction does not apply to pure functions defined with fun.

require 'pure/dsl'

pure do
  def f(*args)  # => raises Pure::SplatError
  end

  fun :g => [:x, :y] do |*args|  # OK
    args.map { |a| a**2 }
  end
end

A block is never passed to a pure function (except if called manually, of course).

A pure function cannot have default arguments.

A pure function should not reference variables declared outside the function definition.

Background

The user should have a basic understanding of functional programming (see for example en.wikipedia.org/wiki/Functional_programming) and the meaning of side effects.

Every pure function you define must explicitly depend on the data it uses.

#
# BAD example: depending on state DATA.value
#
geometry = pure do
  def area(width, height)
    width*height - DATA.value
  end
end

Unless offset DATA.value is really a constant, the computation result is in general not well-defined.

Just as depending on some changeable state is bad, it is likewise bad to affect a state (to produce a side effect).

#
# BAD example: affecting state
#
geometry = pure do
  def area(width, height)
    ACCUMULATOR.add "more data"
    width*height
  end
end

Given a pure computation where functions are modifying ACCUMULATOR, the end state of ACCUMULATOR is not well-defined, even if the methods of ACCUMULATOR are thread-safe.

Philosophy

Languages which are purely functional (e.g. Haskell) employ special constructs (e.g. monads) for dealing with side-effects. This project is roughly analogous to the converse with respect to Ruby.

Haskell code is pure (non-side-effecting) by default, with non-pure operations being stuffed into monads. Ruby code is non-pure (side-effecting) by default, with pure code being stuffed into pure blocks.

Purpose

Pure has two main goals:

  • Parallelize system-intensive code, e.g. system() calls.

  • Provide a framework for parallelizing Ruby code across an arbitrary number of cores/machines.

Due to the global VM lock in Ruby 1.9, the actual execution of Ruby VM instructions is not parallelized. However when a Ruby thread is blocking during a system call, other threads will be executed. Contrariwise in Ruby 1.8 the whole interpreter is blocked during a system call.

The next section addresses the second point above.

Technical Details

Parser Plugins

Pure uses a parser plugin to extract the def and fun definitions inside a pure module. Three parsers are bundled with Pure,

  • Pure::Parser::Internal – require 'pure/parser/internal' – ruby-1.9.2 only

  • Pure::Parser::Ripper – require 'pure/parser/ripper' – ruby-1.9 only

  • Pure::Parser::RubyParser – require 'pure/parser/ruby_parser' – any ruby

The default is tried in that order.

The current parser may be changed via the Pure.parser attribute. A pure module is tied to a parser when the module is created.

The only requirement for a parser plugin is to properly implement an extract() method.

Worker Plugins

A worker plugin is a class which defines what happens when a pure function is triggered to execute. A worker instance is tied to the computation delegate returned by compute().

The default worker looks like this:

module Pure
  class NativeWorker
    attr_reader :num_parallel

    def define_function_begin(pure_module, num_parallel)
      @num_parallel = num_parallel || self.class.num_parallel
      @class = Class.new Names do
        include pure_module
      end
    end

    def define_function(spec)
      lambda { |*args|
        @class.new(spec[:name], spec[:args]).send(spec[:name], *args)
      }
    end

    def define_function_end
    end

    class << self
      attr_accessor :num_parallel
    end
    @num_parallel = 1
  end
end

The following example illustrates the internals.

require 'pure/dsl'
require 'pure/parser/ruby_parser'

class FakeWorker
  #
  # This method is called for each pure function in the pure module.
  #
  # Returns a lambda which computes the function described by spec.
  #
  # For this fake worker, we just return the function info.
  #
  def define_function(spec)
    lambda { |*args|
      [spec, args]
    }
  end

  #
  # Called before all define_function calls.
  #
  # pure_module is the receiver of compute().
  #
  # num_parallel is the hint passed to compute(), or nil if no
  # hint was given.  A worker is free to ignore it, as we do here.
  #
  def define_function_begin(pure_module, num_parallel)
  end

  #
  # Called after all define_function calls.
  #
  def define_function_end
  end

  #
  # When a computation begins, the parallelizing engine asks the
  # worker how many functions to run in parallel.
  #
  def num_parallel
    2
  end

  class << self
    #
    # A num_parallel hint for this worker.  A worker is free to
    # ignore this as well.
    #
    attr_accessor :num_parallel
  end
end

adder = pure(Pure::Parser::RubyParser) do
  def add(left, right)
    left + right
  end
end

require 'pp'
pp adder.compute(FakeWorker, :left => 33, :right => 44).add

#### output:

[{:name=>:add,
  :args=>[:left, :right],
  :code=>
   s(:defn,
    :add,
    s(:args, :left, :right),
    s(:scope,
     s(:block, s(:call, s(:lvar, :left), :+, s(:arglist, s(:lvar, :right)))))),
  :splat=>false,
  :default=>false,
  :file=>"fake_worker.rb",
  :line=>29,
  :origin=>:def,
  :parser=>"Pure::Parser::RubyParser"},
 [33, 44]]

Note the Pure::Parser::Internal parser will not generate a :code entry, as it just calls Method#parameters and does no parsing.

Compilers

A compiler converts a function spec (the hash in the previous output) into a callable Ruby object.

With the addition of a compiler, we have all the components necessary for distributing computations. A function definition and its inputs may be reconstructed on another Ruby interpreter.

require 'pure/dsl'
require 'pure/parser/ruby_parser'
require 'pure/compiler/ruby_parser'

class ExternalWorker
  def initialize
    @compiler = Pure::Compiler::RubyParser.new
  end

  def define_function(spec)
    lambda { |*args|
      @compiler.evaluate_function(spec, *args)
    }
  end

  def define_function_begin(pure_module, num_parallel)
  end

  def define_function_end
  end

  def num_parallel
    2
  end

  class << self
    attr_accessor :num_parallel
  end
end

pure(Pure::Parser::RubyParser) do
  def add(left, right)
    left + right
  end
end.compute(ExternalWorker, :left => 33, :right => 44) do |result|
  puts result.add  # => 77
end

See github.com/quix/tiamat for an example of a multi-core/multi-machine worker plugin for Pure.

Pure::Compiler::RubyParser uses RubyParser and Ruby2Ruby together with a code transformer (for instantiating fun definitions) to compile a function spec.

Tools

Lobby your local ruby-core representative to add a built-in sexp extractor and compiler. Lacking these tools, we are stuck this inefficient roundabout process:

  • Ruby interpreter parses the source

  • RubyParser (redundantly) parses the source and converts it to an sexp

  • Ruby2Ruby converts the sexp to a string containing Ruby code

  • call eval on the code string

However this could be reduced to (for example):

  • Ruby interpreter parses the source

  • call Proc#to_sexp

  • call Proc.from_sexp

Author

  • James M. Lawrence <quixoticsycophant@gmail.com>

License

Copyright (c) 2009 James M. Lawrence.  All rights reserved.

Permission is hereby granted, free of charge, to any person
obtaining a copy of this software and associated documentation files
(the "Software"), to deal in the Software without restriction,
including without limitation the rights to use, copy, modify, merge,
publish, distribute, sublicense, and/or sell copies of the Software,
and to permit persons to whom the Software is furnished to do so,
subject to the following conditions:

The above copyright notice and this permission notice shall be
included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.