/sycamore

A tree data structure for Ruby

Primary LanguageRubyMIT LicenseMIT

Sycamore

"The Egyptians' Holy Sycamore also stood on the threshold of life and death, connecting the two worlds."
-- Wikipedia: Tree of Life

Gem Version Travis CI Build Status Coverage Status Inline docs Documentation Gitter Chat License

Sycamore is an implementation of an unordered tree data structure.

Features:

  • easy, hassle-free access to arbitrarily deep nested elements
  • grows automatically when needed
  • familiar Hash interface
  • no more nil-induced errors

Imagine a Sycamore tree as a recursively nested set. The elements of this set, called nodes, are associated with a child tree of additional nodes and so on. This might be different to your usual understanding of a tree, which has to have one single root node, but this notion is much more general. The usual tree is just a special case with just one node at the first level. But I prefer to think of the root to be implicit. Effectively every object is a tree in this sense. You can assume self to be the implicit root.

Restrictions:

  • Only values you would use as keys of a hash should be used as nodes of a Sycamore tree. Although Ruby's official Hash documentation says a Hash allows you to use any object type, one is well advised to use immutable objects only. Enumerables as nodes are explicitly excluded by Sycamore.
  • The nodes are unordered and can't contain duplicates.
  • A Sycamore tree is uni-directional, i.e. has no relationship to its parent.

Why

Trees in the sense of recursively nested sets are omnipresent today. But why then are there so few implementations of tree data structures? The answer is simple: because of Ruby's powerful built-in hashes. The problem is that while Ruby's Hash, as an implementation of the Hash map data structure, might be perfectly fine for flat dictionary like structures, it is not very well-suited for storing tree structures. Ruby's hash literals, which allow it to easily nest multiple hashes, belie this fact. But it catches up when you want to build up a tree with hashes dynamically and have to manage the hash nesting manually.

In contrast to the few existing implementations of tree data structures in Ruby, Sycamores is based on Ruby's very efficient hashes and contains the values directly without any additional overhead. It only wraps the hashes itself. This wrapper object is very thin, containing nothing more than the hash itself. This comes at the price of the aforementioned restrictions, prohibiting it to be a general applicable tree implementation.

Another compelling reason for the use of Sycamore is its handling of nil. Much has been said about the problem of nil (or equivalent null-values in other languages), including: "It was my Billion-dollar mistake" from its founder, Tony Hoare. Every developer has experienced it in the form of errors such as

NoMethodError: undefined method '[]' for nil:NilClass

With Sycamore this is a thing of the past.

Supported Ruby versions

  • MRI >= 2.1
  • JRuby
  • Rubinius

Dependencies

  • none

Installation

The recommended installation method is via RubyGems.

$ gem install sycamore

Usage

I will introduce Sycamore's Tree API by comparing it with Ruby's Hash API.

In the following I'll always write Tree for the Sycamore tree class, instead of the fully qualified Sycamore::Tree. By default, this global Tree constant is not available. If you want this, you'll have to

require 'sycamore/extension'

When you can't or don't to want to have the Tree alias constant in the global namespace, but still want a short alternative name, you can alternatively

require 'sycamore/stree'

to get an alias constant STree with less potential for conflicts.

I recommend trying the following code yourself in a Ruby REPL like Pry.

Creating trees

A Sycamore::Tree can be created similar to Hashes with the standard constructor or the class-level [] operator.

Tree.new creates an empty Sycamore::Tree.

tree = Tree.new
tree.empty?  # => true

No additional arguments are supported at the time. As you'll see, for a Sycamore::Tree the functionality of the Hash constructor to specify the default value behaviour is of too little value to justify its use in the default constructor, so I'd like to reserve them for something more useful.

The [] operator creates a new Tree and adds the arguments as its initial input. It can handle a single node value, a collection of nodes or a complete tree.

Tree[1]           # => #<Sycamore::Tree:0x3fcfe51a5a3c {1=>n/a}>
Tree[1, 2, 3]     # => #<Sycamore::Tree:0x3fcfe51a56f4 {1=>n/a, 2=>n/a, 3=>n/a}>
Tree[1, 2, 2, 3]  # => #<Sycamore::Tree:0x3fcfe51a52d0 {1=>n/a, 2=>n/a, 3=>n/a}>
Tree[x: 1, y: 2]  # => #<Sycamore::Tree:0x3fcfe51a4e34 {:x=>1, :y=>2}>

As you can see in line 3 nodes are stored as a set, i.e. with duplicates removed.

Note that multiple arguments are not interpreted as an associative array as Hash[] does, but rather as a set of leaves, i.e. nodes without children.

Hash[1, 2, 3, 4]  # => {1=>2, 3=>4}
Hash[1, 2, 3]     # => ArgumentError: odd number of arguments for Hash

You can also see that children of leaves, i.e. nodes without children, are signified with n/a. When providing input data with Hashes, you can use nil as the child value of a leaf.

Tree[x: 1, y: 2, z: nil]  
# => #<Sycamore::Tree:0x3fcfe51a4e34 {:x=>1, :y=>2, :z=>n/a}>

In general the nil child value for leaves in Hash literals is mandatory, but on the first level it can be ommitted, by providing the leaves as an argument before the non-leaf nodes.

Tree[:a, :b, c: {d: 1, e: nil}]
# => #<Sycamore::Tree:0x3fd3f9c6bb0c {:a=>n/a, :b=>n/a, :c=>{:d=>1, :e=>n/a}}>

If you really want to have a node with nil as a child, you'll have to put the nil in an array.

Tree[x: 1, y: 2, z: [nil]]  
# => #<Sycamore::Tree:0x3fd641858264 {:x=>1, :y=>2, :z=>nil}>

Accessing trees

Access to elements of a Sycamore::Tree is mostly API-compatible to that of Rubys Hash class. But there is one major difference in the return type of most of the access methods: Since we are dealing with a recursively defined tree structure, the returned children are always trees as well.

The main method for accessing a tree is the [] operator.

tree = Tree[x: 1, y: {2 => "a"}]

tree[:x]    # => #<Sycamore::Tree:0x3fea48d24d40 {1=>n/a}>
tree[:y]    # => #<Sycamore::Tree:0x3fea48d24b74 {2=>"a"}>
tree[:y][2] # => #<Sycamore::Tree:0x3fea48d248f4 {"a"=>n/a}>

The actual nodes of a tree can be retrieved with the method nodes.

tree.nodes  # => [:x, :y]
tree[:x].nodes  # => [1]
tree[:y].nodes  # => [2]
tree[:y][2].nodes  # => ["a"]

If it's certain that a tree has at most one element, you can also use node to get that node directly.

tree[:y].node     # => 2
tree[:y][2].node  # => "a"
tree[:x][1].node  # => nil
tree.node  # Sycamore::NonUniqueNodeSet: multiple nodes present: [:x, :y]

The bang variant node! raises an error when the node set is empty, instead of returning nil.

tree[:y][2].node!  # => "a"
tree[:x][1].node!  # => # Sycamore::EmptyNodeSet: no node present

As opposed to Hash, the [] operator of Sycamore::Tree also supports multiple arguments which get interpreted as a path.

tree[:y, 2].node  # => "a"

For compatibility with Ruby 2.3 Hashes, this can also be done with the dig method.

tree.dig(:y, 2).node  # => "a"

fetch, as a more controlled way to access the elements, is also supported.

tree.fetch(:x)               # => #<Sycamore::Tree:0x3fea48d24d40 {1=>n/a}>
tree.fetch(:z)               # => KeyError: key not found: :z
tree.fetch(:z, :default)     # => :default
tree.fetch(:z) { :default }  # => :default

Fetching the child of a leaf behaves almost the same as fetching the child of a non-existing node, i.e. the default value is returned or a KeyError gets raised. In order to differentiate these cases, a Sycamore::ChildError as a subclass of KeyError is raised when accessing the child of a leaf.

fetch_path allows a dig similar access with fetch semantics, except it requires the path of nodes to be given as an Enumerable.

tree.fetch_path([:y, 2]).node  # => "a"
tree.fetch_path([:y, 3])               # => KeyError: key not found: 3
tree.fetch_path([:y, 3], :default)     # => :default
tree.fetch_path([:y, 3]) { :default }  # => :default

The number of nodes of a tree can be determined with size. This will only count direct nodes.

tree.size  # => 2

total_size or its short alias tsize returns the total number of nodes of a tree, including the nodes of children.

tree.total_size  # => 5
tree[:y].tsize   # => 2

The height of a tree, i.e. the length of its longest path can be computed with the method height.

tree.height  # => 3

empty? checks if a tree is empty.

tree.empty?         # => false
tree[:x, 1].empty?  # => true

leaf? checks if a node is a leaf.

tree.leaf? :x     # => false
tree[:x].leaf? 1  # => true

leaves? (or one of its aliases external? and flat?) can be used to determine this for more nodes at once.

Tree[1, 2, 3].leaves?(1, 2)  # => true

Without any arguments leaves? returns whether all nodes of a tree are leaves.

Tree[1, 2].leaves?  # => true

include? checks whether one or more nodes are in the set of nodes of this tree.

tree.include? :x        # => true
tree.include? [:x, :y]  # => true

include? can also check whether a tree structure (incl. a hash) is a sub tree of a Sycamore::Tree.

tree.include?(x: 1, y: 2)  # => true

to_h returns the tree as a Hash.

tree.to_h  # => {:x=>1, :y=>{2=>"a"}}

Accessing absent trees

There is another major difference in the access method behaviour of a Scyamore tree in comparison to hashes: The child access methods even return a tree when it does not exist. When you ask a hash for a non-existent element with the [] operator, you'll get a nil, which is an incarnation of the null-problem and the cause of many bug tracking sessions.

hash = {x: 1, y: {2 => "a"}}
hash[:z]  # => nil
hash[:z][3]  # => NoMethodError: undefined method `[]' for nil:NilClass

Sycamore on the other side returns a special tree, the Nothing tree:

tree = Tree[x: 1, y: {2 => "a"}]
tree[:z]     # => #<Sycamore::Nothing>
tree[:z][3]  # => #<Sycamore::Nothing>

Sycamore::Nothing is a singleton Tree implementing a null object. It behaves on every query method call like an empty tree.

Sycamore::Nothing.empty?  # => true
Sycamore::Nothing.size    # => 0
Sycamore::Nothing[42]     # => #<Sycamore::Nothing>

Sycamore adheres to a strict command-query-separation (CQS). A method is either a command changing the state of the tree and returning self or a query method, which only computes and returns the results of the query, but leaves the state unchanged. The only exception to this strict separation is made, when it is necessary in order to preserve Hash compatibility. All query methods are supported by the Sycamore::Nothing tree with empty tree semantics.

Among the command methods are two subclasses: additive command methods, which add elements and destructive command methods, which remove elements. These are further refined into pure additive and pure destructive command methods, which either support additions or deletions only, not both operations at once. The Sycamore::Tree extends Ruby's reflection API with class methods to retrieve the respective methods: query_methods, command_methods, additive_command_methods, destructive_command_methods, pure_additive_command_methods, pure_destructive_command_methods.

Tree.command_methods
# => [:add, :<<, :replace, :create_child, :[]=, :delete, :>>, :clear, :compact, :replace, :[]=, :freeze]
Tree.additive_command_methods
# => [:add, :<<, :replace, :create_child, :[]=]
Tree.pure_additive_command_methods
# => [:add, :<<, :create_child]
Tree.pure_destructive_command_methods
# => [:delete, :>>, :clear, :compact]

Pure destructive command methods on Sycamore::Nothing are no-ops. All other command methods raise an exception.

Sycamore::Nothing.clear  # => #<Sycamore::Nothing>
Sycamore::Nothing[:foo] = :bar  
# => Sycamore::NothingMutation: attempt to change the Nothing tree

But inspecting the Nothing tree returned by Tree#[] further shows, that this isn't the end of the story.

tree[:z].inspect
# => absent child of node :z in #<Sycamore::Tree:0x3fc88e04a470 {:x=>1, :y=>{2=>"a"}}>
tree[:z][3].inspect
# => absent child of node 3 in absent child of node :z in #<Sycamore::Tree:0x3fc88e04a470 {:x=>1, :y=>{2=>"a"}}>

We'll actually get an Absence object, a proxy object for the requested not yet existing tree. As long as we don't try to change it, this Absence object delegates all method calls to Sycamore::Nothing. But as soon as we call a non-pure-destructive command method, the missing tree will be created, added to the parent tree and the method call gets delegated to the now existing tree.

tree[:z] = 3
tree.to_h  # => {:x=>1, :y=>{2=>"a"}, :z=>3}

So a Sycamore::Tree is a tree, on which the nodes grow automatically, but only when needed. And this works recursively on arbitrarily deep nested absent trees.

tree[:some][:very][:deep] = :node
tree.to_h  # => {:x=>1, :y=>{2=>"a"}, :z=>3, :some=>{:very=>{:deep=>:node}}}

In order to determine whether a node has no children, you can simply use empty?.

tree = Tree[a: 1]
tree[:a].empty?  # => false
tree[:b].empty?  # => true  

But how can you distinguish an empty from a missing tree?

user = Tree[name: 'Adam', shopping_cart_items: []]

user[:shopping_cart_items].empty?  # => true
user[:foo].empty?                  # => true

One way is the use of the absent? method, which only returns true on an Absence object.

user[:shopping_cart_items].absent?  # => false
user[:foo].absent?                  # => true

Another possibility, without the need to create the Absence in the first place is the leaf? method, since it also checks for the presence of a node.

user.leaf? :shopping_cart_items         # => true
user.leaf? :foo                         # => false

But the leaf? method has as similar problem in this respect: it doesn't differentiate between absent and empty children.

tree = Tree[foo: nil, bar: []]
tree.leaf? :foo         # => true
tree.leaf? :bar         # => true

strict_leaf? and strict_leaves? (or their short aliases sleaf? and sleaves?) are more strict in this regard: when a node has an empty child tree it is considered a leaf, but not a strict leaf.

tree.strict_leaf? :foo  # => true
tree.strict_leaf? :bar  # => false

Besides absent?, the congeneric methods blank? (as an alias of empty?) and its negation present? are ActiveSupport compatible available. Unfortunately, the natural expectation of Tree#present? and Tree#absent? to be mutually opposed leads astray.

user[:shopping_cart_items].absent?   # => false
user[:shopping_cart_items].present?  # => false

The risks rising from an ActiveSupport incompatible present? is probably greater then this inconsistence. So, if you want check if a tree is not absent, use existent? as the negation of absent?.

Beside these options, fetch is also a method to handle this situation in a nuanced way.

user.fetch(:shopping_cart_items)  # => #<Sycamore::Tree:0x3febb9c9b3d4 {}>
user.fetch(:foo)                            
# => KeyError: key not found: :foo
user.fetch(:foo, :default)  # => :default

Empty child trees also play a role when determining equality. The eql? and == equivalence differ exactly in their handling of this question: == treats empty child trees as absent trees, while eql? doesn't.

Tree[:foo].eql? Tree[foo: []]  # => false
Tree[:foo] == Tree[foo: []]    # => true

All empty child trees can be removed with compact.

Tree[:foo].eql? Tree[foo: []].compact  # => true

An arbitrary structure can be compared with a Sycamore::Tree for equality with ===.

Tree[:foo, :bar] === [:foo, :bar]      # => true
Tree[:foo, :bar] === Set[:foo, :bar]   # => true
Tree[:foo => :bar] === {:foo => :bar}  # => true

Changing trees

Let's examine the command methods to change the contents of a tree. The add method or the << operator as its alias allows the addition of one, multiple or a tree structure of nodes.

tree = Tree.new
tree << 1
tree << [2, 3]
tree << {3 => :a, 4 => :b}
puts tree 
> Tree[1=>nil, 2=>nil, 3=>:a, 4=>:b]

The []= operator is Hash-compatible supported.

tree[5] = :c
puts tree 
> Tree[1=>nil, 2=>nil, 3=>:a, 4=>:b, 5=>:c]

Note that this is just an add with a previous call of clear, which deletes all elements of the tree. This means, you can safely assign another tree without having to think about object identity.

If you want to explicitly state, that a node doesn't have any children, you can specify it in the following equivalent ways.

tree[:foo] = []
tree[:foo] = {}

To remove a child tree entirely, you can assign Nothing or nil to the parent node.

tree[:foo] = Nothing
tree[:foo] = nil

If you really want to overwrite the current child nodes with a single nil node, you have to do it in the following way.

tree[:foo] = [nil]

Note that all of these values are interpreted consistently inside input tree structures on creation, addition, deletion etc., i.e. empty Enumerables become empty child trees, Nothing or nil are used as place holders for the explicit negation of a child and [nil] is used for a child trees with a single nil node.

puts Tree[ a: { b: nil }, c: { d: []}, d: [nil] ]
>Tree[:a=>:b, :c=>{:d=>[]}, :d=>[nil]]

Beside the deletion of all elements with the already mentioned clear method, single or multiple nodes and entire tree structures can be removed with delete or the >> operator.

tree >> 1
tree >> [2, 3]
tree >> {4 => :b}
puts tree 
> Tree[5=>:c, :foo=>[]]

When removing a tree structure, only child trees with no more existing nodes get deleted.

tree = Tree[a: [1,2]]
tree >> {a: 1}
puts tree 
> Tree[:a=>2]

tree = Tree[a: 1, b: 2]
tree >> {a: 1}
puts tree 
> Tree[:b=>2]

Iterating trees

The fundamental each and with that all Enumerable methods behave Hash-compatible.

tree = Tree[ 1 => {a: 'foo'}, 2 => :b, 3 => nil ]
tree.each { |node, child| puts "#{node} => #{child}" }

> 1 => Tree[:a=>"foo"]
> 2 => Tree[:b]
> 3 => Tree[]

each_path iterates over all paths to leafs of a tree.

tree.each_path { |path| puts path }

> #<Path: /1/a/foo>
> #<Path: /2/b>
> #<Path: /3>

The paths are represented by Sycamore::Path objects and are basically an Enumerable of the nodes on the path, specifically optimized for the enumeration of the set of paths of a tree. It does this, by sharing nodes between the different path objects. This means in the set of all paths, every node is contained exactly once, even the internal nodes being part of multiple paths.

Tree['some possibly very big data chunk' => [1, 2]].each_path.to_a
# => [#<Sycamore::Path["some possibly very big data chunk",1]>,
#     #<Sycamore::Path["some possibly very big data chunk",2]>]

Searching in trees

search returns the set of all paths to child trees containing a node or tree.

tree = Tree[ 1 => {a: 'foo'}, 2 => :b, 3 => [:a, :b, :c] ]
tree.search :a        # => [#<Sycamore::Path[1]>, #<Sycamore::Path[1]>]
tree.search a: 'foo'  # => [#<Sycamore::Path[1]>]

If you search for multiple nodes, only the paths to child trees containing all of the given nodes are returned.

tree.search [:b, :c]  # => [#<Sycamore::Path[3]>]

All Tree methods for which it makes sense accept path objects as input instead or in combination with nodes or tree structures. This allows it to apply the search results to any of these methods.

Getting help

Contributing

see CONTRIBUTING for details.

License and Copyright

(c) 2015-2016 Marcel Otto. MIT Licensed, see LICENSE for details.