An implementation of unboxed data structures for OCaml by Robert Atkey bob.atkey@gmail.com.
The runtime representation of OCaml arrays usually stores the elements
using a 'boxed' representation, meaning that if the value of a
particular element cannot be stored within a single machine word
(i.e., it is not an argument-less constructor or an int
value), then
the array actually contains a pointer to the representation of the
element elsewhere in the heap. (Note that float
arrays are special
cased.) This boxing representation leads to an extra indirection when
accessing elements of the array, means that memory is wasted, and
decreases data locality, to the detriment of efficent cache usage.
The unboxed
library implements functors that generate new abstract
types representing arrays, whose run-time representation is 'unboxed',
so that each of the elements in stored inline in the array. The
library uses unsafe operations from the Obj
module in its
implementation, but this unsafety is completely hidden from clients
behind abstract types.
Note that this implementation has not yet been benchmarked, and so can
only be regarded as a proof-of-concept for implementing unboxed
arrays. In particular, it is not clear to me that the overhead of
allocating Data
constructor (see the Element_Descriptor
signature
below) is not too expensive.
In the future, it would be nice to explore alternative representations
for more specialised representations of unboxed arrays. For example,
using bitmaps to representation occupancy information for unboxed 'a option array
s.
$ ocaml setup.ml -build
There are no dependencies beyond plain OCaml (version 4.01.0 tested), and findlib.
Unboxed array implementations are generated by using the following
functor (in the module Unboxed.MonomorphicVariantArray
):
module Make (Elt : Element_Descriptor) : S with type elt = Elt.t
where the signature S
describes a basic imperative array interface:
module type S = sig
type t
type elt
val create : int -> elt -> t
val init : int -> (int -> elt) -> t
val get : t -> int -> elt
val set : t -> int -> elt -> unit
val length : t -> int
end
The element type is described using modules matching the following signature. Modules implementing this signature set out (a) the allowable constructors for the elements; (b) the types of the data associated with each constructor; (c) the maximum size of an element; and (d) the specific width associated with each constructor.
module type Element_Descriptor = sig
type 'a constructor
type size
val size : size is_a_natural
val width_of : 'a constructor -> ('a, size) width
type t = Data : 'a constructor * 'a -> t
end
The type 'a constructor
is usually a GADT (Generalised Algebraic
Datatype) that uses the type parameter to describe the types of the
data associated with that constructor. The type size
is expected
to be of the form zero succ ... succ
, and represents in unary
notation the maximum size of any element. The value size
is a
runtime representation of the number represented by size
. The
function width_of
describes the necessary width to store the data
associated with each constructor, with a static check that all the
widths are less than size
. Finally, the type t
defines a GADT that
represents 'boxed' representations used to represent element values
outwith the array.
The types is_a_natural
and width
are defined in Unboxed
module.
The functor Unboxed.PolymorphicVariantArray.Make
is similar to the
functor described above, but generates arrays with polymorphic element
types.
The following implementation of Element_Descriptor
describes a
datatype with three constructors:
module Elt = struct
open Unboxed
type 'a constructor =
| Empty : unit constructor
| Deleted : unit constructor
| Value : (int * string) constructor
type size = zero succ succ
let size = Succ (Succ Zero)
let width_of (type d) (c : d constructor) : (d,size) width =
match c with
| Empty -> Width0
| Deleted -> Width0
| Value -> Width2
type t = Data : 'd constructor * 'd -> t
end
module A = Unboxed.MonomorphicVariantArray.Make (Elt)
open Elt
Values of type A.t
now represent boxed values of type Elt.t
, and
can be manipulated through the interface S with type elt = Elt.t
, in
a fairly natural way:
# let arr = A.create 10 (Data (Empty, ()));;
val arr : A.t = <abstr>
# A.set arr 0 (Data (Value, (1, "foo")));;
- : unit = ()
# A.set arr 5 (Data (Deleted, ()));;
- : unit = ()
# let elt_to_string arr i =
match A.get arr i with
| Data (Empty, ()) -> "Empty"
| Data (Deleted, ()) -> "Deleted"
| Data (Value, (k,v)) -> Printf.sprintf "Value (%d,%S)" k v;;
val elt_to_string : A.t -> int -> string = <fun>
# elt_to_string arr 0 |> print_endline;;
Value (1,"foo")
- : unit = ()
# elt_to_string arr 1 |> print_endline;;
Empty
- : unit = ()
# elt_to_string arr 5 |> print_endline;;
Deleted
- : unit = ()
#
There is a more substantial example using unboxed arrays to implement
a hashtable with linear probing in the file hashtable.ml
. This
example also demonstrates the use of the
Unboxed.PolymorphicVariantArray.Make
functor for generating
polymorphic unboxed array types.