/unboxed

Unboxed arrays for OCaml

Primary LanguageOCaml

Unboxed

An implementation of unboxed data structures for OCaml by Robert Atkey bob.atkey@gmail.com.

The runtime representation of OCaml arrays usually stores the elements using a 'boxed' representation, meaning that if the value of a particular element cannot be stored within a single machine word (i.e., it is not an argument-less constructor or an int value), then the array actually contains a pointer to the representation of the element elsewhere in the heap. (Note that float arrays are special cased.) This boxing representation leads to an extra indirection when accessing elements of the array, means that memory is wasted, and decreases data locality, to the detriment of efficent cache usage.

The unboxed library implements functors that generate new abstract types representing arrays, whose run-time representation is 'unboxed', so that each of the elements in stored inline in the array. The library uses unsafe operations from the Obj module in its implementation, but this unsafety is completely hidden from clients behind abstract types.

Note that this implementation has not yet been benchmarked, and so can only be regarded as a proof-of-concept for implementing unboxed arrays. In particular, it is not clear to me that the overhead of allocating Data constructor (see the Element_Descriptor signature below) is not too expensive.

In the future, it would be nice to explore alternative representations for more specialised representations of unboxed arrays. For example, using bitmaps to representation occupancy information for unboxed 'a option arrays.

To build

$ ocaml setup.ml -build

There are no dependencies beyond plain OCaml (version 4.01.0 tested), and findlib.

Interface

Unboxed array implementations are generated by using the following functor (in the module Unboxed.MonomorphicVariantArray):

module Make (Elt : Element_Descriptor) : S with type elt = Elt.t

where the signature S describes a basic imperative array interface:

module type S = sig
  type t
  type elt 
  val create : int -> elt -> t
  val init   : int -> (int -> elt) -> t
  val get    : t -> int -> elt
  val set    : t -> int -> elt -> unit
  val length : t -> int
end

The element type is described using modules matching the following signature. Modules implementing this signature set out (a) the allowable constructors for the elements; (b) the types of the data associated with each constructor; (c) the maximum size of an element; and (d) the specific width associated with each constructor.

module type Element_Descriptor = sig
  type 'a constructor
  type size
  val size : size is_a_natural
  val width_of : 'a constructor -> ('a, size) width
  type t = Data : 'a constructor * 'a -> t
end

The type 'a constructor is usually a GADT (Generalised Algebraic Datatype) that uses the type parameter to describe the types of the data associated with that constructor. The type size is expected to be of the form zero succ ... succ, and represents in unary notation the maximum size of any element. The value size is a runtime representation of the number represented by size. The function width_of describes the necessary width to store the data associated with each constructor, with a static check that all the widths are less than size. Finally, the type t defines a GADT that represents 'boxed' representations used to represent element values outwith the array.

The types is_a_natural and width are defined in Unboxed module.

The functor Unboxed.PolymorphicVariantArray.Make is similar to the functor described above, but generates arrays with polymorphic element types.

A Simple Example

The following implementation of Element_Descriptor describes a datatype with three constructors:

module Elt = struct
  open Unboxed

  type 'a constructor =
    | Empty   : unit constructor
    | Deleted : unit constructor
    | Value   : (int * string) constructor

  type size = zero succ succ
  let size = Succ (Succ Zero)

  let width_of (type d) (c : d constructor) : (d,size) width =
      match c with
        | Empty   -> Width0
        | Deleted -> Width0
        | Value   -> Width2

  type t = Data : 'd constructor * 'd -> t
end

module A = Unboxed.MonomorphicVariantArray.Make (Elt)

open Elt

Values of type A.t now represent boxed values of type Elt.t, and can be manipulated through the interface S with type elt = Elt.t, in a fairly natural way:

# let arr = A.create 10 (Data (Empty, ()));;
val arr : A.t = <abstr>
# A.set arr 0 (Data (Value, (1, "foo")));;
- : unit = ()
# A.set arr 5 (Data (Deleted, ()));;
- : unit = ()
# let elt_to_string arr i =
    match A.get arr i with
      | Data (Empty, ()) -> "Empty"
      | Data (Deleted, ()) -> "Deleted"
      | Data (Value, (k,v)) -> Printf.sprintf "Value (%d,%S)" k v;;
val elt_to_string : A.t -> int -> string = <fun>
# elt_to_string arr 0 |> print_endline;;
Value (1,"foo")
- : unit = ()
# elt_to_string arr 1 |> print_endline;;
Empty
- : unit = ()
# elt_to_string arr 5 |> print_endline;;
Deleted
- : unit = ()
# 

There is a more substantial example using unboxed arrays to implement a hashtable with linear probing in the file hashtable.ml. This example also demonstrates the use of the Unboxed.PolymorphicVariantArray.Make functor for generating polymorphic unboxed array types.