/Open-Closed-Interval-Tree

Open-Closed Balanced Interval Tree (red-black implementation)

Primary LanguageC#MIT LicenseMIT

Introduction

This is an implementation of augmented Interval Trees stored in a balanced red and black tree. Most of the implementation is explained in detail in Introduction to Algorithms by T. Cormen et al. with the addition of open-closed intervals

The source has been compiled both on .NET and Mono.

Functionality

The tree knows how to store intervals of any IComparable type. Both insertion and deletion are implemented. Querying is possible for all intervals containing a point or all intervals overlapping with an interval.

Implementation

Memory

Each interval is stored in one node which takes a constant amount of memory, and no redundant nodes are needed, thus the memory complexity is O(n).

Leaf pointers

The leaves of an interval tree are formally NULL nodes, some implementations choose to create dummy nodes instead of leaving null pointers in the real tree leaves. This implementation uses the concept of a sentinel node - a common node to which all null pointers point. The Left, Right and Parent nodes of the Sentinel node is itself.

This makes it easier to implement the rest of the features, since it can be manipulated as an ordinary node and yet maintains the behaviour that a null pointer would have.

Insertion

Insertion finds an empty spot for the element (it will do nothing if it already is in the tree), attaches the interval and rebalances the tree.

Insertion is O(lg n). Descending to a free spot costs a maximum of lg N steps (from the balanced nature of the red-black tree). RenewConstraintsAfterInsert called to rebalance the tree runs in O(1) as only local rotation and recoloring is performed, in the worst case it may bubble up to the root, which gives a total of O(lg n). Together, O(lg n + lg n) is still O(lg n).

In addition to an ordinary BST, during insertion, the MaxEnd (holding the maximum interval endpoint value of the node's subtree) property of nodes must be updated. This can be easily accomplished in O(1), since all operations are local and properties of the respective node subtrees are maintained constant.

Deletion

Similarly, deletion also runs in O(lg n). O(lg n) is the time necessary to find the value to be deleted. There are three possible cases:

  • the node has both children - the successor (minimum node of right subtree) is taken and replaces the node, the node, now in the place of the successor, must have only one child, converting it to the the next case:
  • the node has one child or none - we take the existing child (or the left one if one) and replace the node with it

Deletion then runs exactly as in an ordinary R-B tree. The only difference is the maintaining of the MaxEnd property, but that is taken care of in the rotation procedures common with Insertion.

Rebalancing starts with the replaced node (taking O(1) time, as all rotations are local) and may bubble up to the root, taking O(lg n) time.

By examining all the possible scenarios, during both operations, all the tree constraints are maintained ensuring the necessary time complexities for traversing the tree.

Querying for a single point

As a point can be present in all of the intervals in the tree, worst case querying is O(N). To reduce the amount of tests and traversing, two scenarios can be cut off - when the value is larger than the MaxEnd value of the node (then it cannot be in the whole subtree) or if its smaller than the start point of the interval of a node, then it cannot be in the right subtree.

Querying for first overlapping interval

The fastest check is whether an overlapping interval exists (e.g. to use when looking for a conflict in a time schedule). The search method either stops when it reaches the sentinel node (no matches) or when it finds an overlapping match. If such exists, it always returns it.

The query runs in O(lg n), since following one path is enough to determine if the tree has an overlapping interval.

Searching is done by descending down the tree. In each node, the following invariant holds: If the tree contains an overlapping interval, it is located in the subtree of the node. When starting from root, it holds.

During descent, one of two paths are followed. The left subtree is ommited if the start of the interval is greater than the MaxEnd of the left node: clearly it cannot be located there. If the search goes Left, there must be an overlapping interval in the left subtree as there exists and interval i, whose end value equals left's MaxEnd.

Open-Closed intervals

The addition of IntervalType allows for the endpoints to be either excluded (open) or included (closed) from the query. An open interval does not include its endpoints, and is indicated with parentheses. For example, (10,15) means greater than 10 and less than 15. A closed interval does includes all its limit points, and is denoted with square brackets. For example, [10,15] means greater than or equal to 10 and less than or equal to 15. A half-open interval includes only one of its endpoints, and is denoted by mixing the notations for open and closed intervals. (10,15] means greater than 10 and less than or equal to 15, while [10,15) means greater than or equal to 10 and less than 15.

Building

On Windows, the application is prepared as a VS 2010 project. On Mono, clone the repository and run xbuild.