/saturn

Lock-free data structures for multicore OCaml

Primary LanguageOCamlISC LicenseISC

API Reference · Benchmarks · Stdlib Benchmarks

Saturn — Parallelism-Safe Data Structures for Multicore OCaml

This repository is a collection of parallelism-safe data structures for OCaml 5. They are contained in two packages:

  • Saturn that includes all data structures (including the lock-free ones) and should be used by default if you just want parallelism-safe data structures;
  • Saturn_lockfree that includes only lock-free data structures.

It aims to provide an industrial-strength, well-tested (and possibly model-checked and verified in the future), well documented, and maintained parallelism-safe data structure library. We want to make it easier for Multicore OCaml users to find the right data structures for their uses.

Saturn is published on opam and is distributed under the ISC license.

OCaml-CI Build Status GitHub release (latest by date) docs

Contents

Installation

Getting OCaml 5.0

You'll need OCaml 5.0.0 or later. Note that Saturn also works with OCaml 4.14 but only for compatibility reasons, as there is no need for parallelism-safe data structures without OCaml 5.0.

To install OCaml 5.0 yourself, first make sure you have opam 2.1 or later. You can run this command to check:

opam --version

Then use opam to install OCaml 5.0.0:

opam switch create 5.0.0

If you want a later version, you can run the following line to get a list of all available compiler versions:

opam switch list-available

Getting Saturn

saturn can be installed from opam:

opam install saturn

or

opam install saturn_lockfree

if you prefer to use only lock-free data structures.

Introduction

Provided data structures

Name Module in Saturn
(in Saturn_lockfree)
Description Sources
Treiber Stack Stack (same) A classic multi-producer multi-consumer stack, robust and flexible. Recommended starting point when needing a LIFO structure
Michael-Scott Queue Queue (same) A classic multi-producer multi-consumer queue, robust and flexible. Recommended starting point when needing a FIFO structure. Simple, Fast, and Practical Non-Blocking and Blocking Concurrent Queue Algorithms
Chase-Lev Work-Stealing Dequeue Work_stealing_deque (same) Single-producer, multi-consumer, dynamic-size, double-ended queue (deque). Ideal for throughput-focused scheduling using per-core work distribution. Note, pop and steal follow different ordering (respectively LIFO and FIFO) and have different linearisation contraints. Dynamic Circular Work-Stealing Deque and Correct and Efficient Work-Stealing for Weak Memory Models)
SPSC Queue Single_prod_single_
cons_queue
(same)
Simple single-producer single-consumer fixed-size queue. Thread-safe as long as at most one thread acts as producer and at most one as consumer at any single point in time.
MPMC Bounded Relaxed Queue Relaxed_queue (same) Multi-producer, multi-consumer, fixed-size relaxed queue. Optimised for high number of threads. Not strictly FIFO. Note, it exposes two interfaces: a lockfree and a non-lockfree (albeit more practical) one. See the mli for details.
MPSC Queue Single_consumer_queue (same) A multi-producer, single-consumer, thread-safe queue without support for cancellation. This makes a good data structure for a scheduler's run queue. It is used in Eio. It is a single consumer version of the queue described in Implementing Lock-Free Queues.

Motivation

The following part is a beginner-friendly example to explain why we need data structures specifically designed for multicore programming.

Let's write a classic mutable data structure : a queue. We are going to do a basic implementation the way it may be done for sequential use and show why it is not working well with multiple domains.

type queue = int list ref

let create () : queue = ref []

let push q a = q := a :: !q

What happens if we try to use this queue with multiple domains? First, let's define the work we want a single domain to do: each domain will push 10 times its id in the queue.

let work id q = for i = 0 to 9 do push q id done

Then let's define our test : it spawns 2 domains that each execute work in parallel. test returns the content of the queue as well as its length, so we can easily see if it contains the 20 elements we expect.

 let test () =
   let q = create () in
   let domainA = Domain.spawn (fun () -> work 1 q) in
   let domainB = Domain.spawn (fun () -> work 2 q) in
   Domain.join domainA;
   Domain.join domainB;
   (List.length !q, !q)

Let's try it :

# test ()
- : int * int list =
(20, [2; 2; 2; 2; 2; 2; 2; 2; 2; 2; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1])

Everything seems fine, right? Except, it is not running in parallel, as we can see from the consecutive 2 (pushed by domainB) and 1 pushed by domainA. This is because spawning a domain takes way more time than executing work, so domainA is long finished before domainB is even spawned. One way to overpass this issue is to increase the amount of work done in work (for example, by pushing more elements). Another way is to make sure the domains wait for each other before beginning their workload.

We use a basic barrier implementation to do that. Each domain will now wait for the other to reach the barrier before beginning to push.

let work_par id barrier q =
  Barrier.await barrier;
  for i = 0 to 9 do
    push q id
  done

The test function is now:

let test_par () =
  let barrier = Barrier.create 2 in
  let q = create () in
  let domainA = Domain.spawn (fun () -> work_par 1 barrier q) in
  let domainB = Domain.spawn (fun () -> work_par 2 barrier q) in
  Domain.join domainA;
  Domain.join domainB;
  (List.length !q, !q)

Let's run it:

# test_par ();;
- : int * int list =
(18, [2; 1; 2; 1; 2; 1; 2; 1; 2; 1; 2; 2; 1; 1; 1; 1; 2; 1])

Now, the 1 and the 2 are interleaved: domains are running in parallel. The resulting queue however only contains 18 elements whereas 20 were pushed. This is because we not only have a race condition here but also because push is a non-atomic operation. It requires first to read the content of the queue (!q) then to write in it (q := ...). So, for example, when two domains try to push in parallel into an empty queue the following sequence can happen:

  • domain A reads the queue : it is empty
  • domain B reads the queue : it is still empty
  • domain A pushes 1 on the empty queue it has read before
  • domain B pushes 2 on the empty queue it has read before.

This sequence results in a queue containing only one element of value 2. The element pushed by A is lost because B did not see it.

This is a very common issue in parallel programming. To prevent it, functions need to be atomically consistent (aka linearisable), meaning they must have a linearisation point at which they appear to occur instantly. Such functions can be written with different techniques, including:

  • use of Atomic for mutable variables,
  • use of a mutual exclusion mechanism like Mutex.

However both solutions have their limits. Using mutexes or locks open the way to deadlock, livelock, priority inversion, etc; it also often restrics considerably the performance gained by using multiple cores as the parts of the code effectively running in parallel is limited. On the other hand, atomics are - without a complex algorithm to combine them - only a solution for a single shared variable.

Let's try to replace references by atomics in our code to demonstrate this point:

type queue = int list Atomic.t

let create () : queue = Atomic.make []

let push (q : queue) a =
  let curr = Atomic.get q in
  let prev = a :: curr in
  Atomic.set q prev

We still need to read and write to fulfill the whole push operation.

# test ();;
- : int * int list = (15, [1; 1; 1; 1; 1; 2; 1; 1; 1; 2; 2; 2; 2; 1; 2])

and, as expected it is not working. The interleaving scenario described previously can still happen, meaning our function is not linearisable (or atomically consistent). Note that, though it is not observable here, this is still better than the previous implementation, as we are now race-free (see here for a quick note about races in OCaml). As a matter of fact, writting a queue with push and pop functions that are both atomic and parallelism-safe is not as easy as it might sound and often requires advanced techniques to perform well. This is the type of algorithms Saturn_lockfree provided.

To continue with our example, here is how it will be written using the queue provided in Saturn.

let work_saturn id barrier q () =
  Barrier.await barrier;
  for i = 0 to 9 do
    Saturn.Queue.push q id
  done

let test_saturn () =
  let barrier = Barrier.create 2 in
  let q = Saturn.Queue.create () in
  let d1 = Domain.spawn (work_saturn 1 barrier q) in
  let d2 = Domain.spawn (work_saturn 2 barrier q) in
  Domain.join d1;
  Domain.join d2;
  let rec pop_all acc =
    match Saturn.Queue.pop q with
    | None -> List.rev acc
    | Some elt -> pop_all (elt :: acc)
  in
  let res = pop_all [] in
  (List.length res, res)

Running it results in the expected result:

# test_saturn ();;
- : int * int list =
(20, [2; 2; 1; 2; 2; 2; 2; 2; 2; 1; 2; 1; 2; 1; 1; 1; 1; 1; 1; 1])

A note about races in OCaml

Because of the great properties of OCaml 5 memory model (see the OCaml Manual for more details), not a lot can go wrong here. At least, data corruption or segmentation fault won't happen like it can in other languages.

Safe and unsafe data structures

Some data structures are available in two versions: a normal version and a more optimized but unsafe version. The unsafe version utilizes Obj.magic in a way that may be unsafe with flambda2 optimizations.

The reason for providing the unsafe version is that certain optimizations require features that are currently not available in OCaml, such as arrays of atomics or atomic fields in records. We recommend using the normal version of a data structure unless its performance is not sufficient for your use case. In that case, you can try the unsafe version.

Currently, there are two data structures with an unsafe version:

  • Single_cons_single_prod_unsafe: a single consumer single producer queue
  • Queue_unsafe: a Michael Scott queue

Usage

This part describes how to use the provided data structures, and more exactly, what not to do with them. Two main points are discussed:

  • some data structures have restrictions on what operations can be performed in a single domain or a set of domains
  • the currently provided data structures are non-composable

Data Structures With Domain Roles

There are several provided data structures that are intended to be used in combination with a specific domain configuration. These restrictions make the corresponding implementation optimized but not respected them may break safety properties. Obviously, these restrictions are not only described in the documentation but also on the name of the data structure itself. For example, a single consumer queue can only have a single domain popping at any given time.

Let's take the example of Single_prod_single_cons_queue. As suggested by the name, it should be used with only one domain performing push (a producer) and one domain performing pop (a consumer) at the same time. Having two or more domains simultaneously perform pop (or push) will break the safety properties of the queue and more likely result in unexpected behaviors.

Let's say we give a bad alias to this queue and misuse it.

module Queue = Saturn.Single_prod_single_cons_queue

Each domain is going to try to push 10 times in parallel.

let work id barrier q =
  Barrier.await barrier;
  for i = 0 to 9 do
    Queue.push q id
  done

Our test function returns the queue after two domains try to simustaneously push.

let test () =
  let q = Queue.create ~size_exponent:5 in
  let barrier = Barrier.create 2 in
  let d1 = Domain.spawn (fun () -> work 1 barrier q) in
  let d2 = Domain.spawn (fun () -> work 2 barrier q) in
  Domain.join d1;
  Domain.join d2;
  q

We can then inspect the content of the queue by popping it into a list.

let get_content q =
  let rec loop acc =
    match Queue.pop q with
    | None -> acc
    | Some a -> loop (a :: acc)
  in
  loop [] |> List.rev

Let's run it :

test () |> get_content;;
- : int list = [2; 1; 1; 1; 1; 1; 1; 1; 1; 1; 2]

This run results in a queue with only 11 elements. 9 elements are lost because of the misuse of the single consumer single producer queue.

About Composability

Composability is the ability to compose functions while preserving their properties. For Saturn data structures, the properties one could expect to preserve are atomic consistency (or linearizability) and all eventual progress properties, like lock freedom. Unfortunately, Saturn's data structures are not composable.

Let's illustrate that with an example. We want to write a slitting algorithm on Saturn's queue: several domains simultaneously slit a source queue into two destination queues in respect to a predicate. We expect our splitting function to be linearisable, which would manifest here by the source queue order is preserved in the destination queues. For example, slit [0;1;2;3;4], with a predicate that returns true for even numbers and false otherwise, should returns [0';2;4] and [1;3].

Here is how we can write slit with the functions provided by Saturn's queue.

let slit source pred true_dest false_dest : bool =
  match Queue.pop source with
  | None -> false
  | Some elt ->
      if pred elt then Queue.push true_dest elt else Queue.push false_dest elt;
      true

Domains run split until the source queue is empty:

let work source pred true_dest false_dest =
  while split source pred true_dest false_dest do
    ()
  done

Now to test it, we will run:

let test input =
  (* Initialisation *)
  let true_dest = Queue.create () in
  let false_dest = Queue.create () in
  let source = Queue.create () in
  List.iter (Queue.push source) input;

  let barrier = Barrier.create 2 in

  (* Predicate : split by parity *)
  let pred elt = elt mod 2 = 0 in

  let d1 =
    Domain.spawn (fun () ->
        Barrier.await barrier;
        work source pred true_dest false_dest)
  in
  let d2 =
    Domain.spawn (fun () ->
        Barrier.await barrier;
        work source pred true_dest false_dest)
  in
  Domain.join d1;
  Domain.join d2;
  (get_content true_dest, get_content false_dest)

The expected result for test [0; 1; 2; 3; 4] is ([0; 2; 4], [1; 3]). And if you try it, you will most probably get that result. Except it can also return in unsorted queues.

As the chance of getting an unsorted queue, we write a check function that runs test multiple times and counts the number of times the result is not what we wanted.

let check inputs max_round =
  let expected_even = List.filter (fun elt -> elt mod 2 = 0) inputs in
  let expected_odd = List.filter (fun elt -> elt mod 2 = 1) inputs in
  let rec loop round bugged =
    let even, odd = test inputs in
    if round >= max_round then bugged
    else if even <> expected_even || odd <> expected_odd then
      loop (round + 1) (bugged + 1)
    else loop (round + 1) bugged
  in
  Format.printf "%d/%d rounds are bugged.@." (loop 0 0) max_round

and try it:

# check [0;1;2;3;4;5;6] 1000;;
35/1000 rounds are bugged.

As expected, it is not working, and the reason is simply because our split function is not linerisable. We could make it atomic by using mutex, but then we loose the progress properties of the composed functions.

Extending Data Structures

Note that in the case above, we transfer from and to a queue of the same int Saturn.Queue.t type. It is most likely possible to write a val transfer : t -> t -> unit function with the right properties and add it directly to Saturn.Queue module.

If you think of any such functions, that is useful and missing, let's us know by creating an issue!

Composable Parallelism-Safe Data Structures

If you need composable parallelism-safe data structures, you can check kcas_data.

Testing

One of the many difficulties of implementating parallelism-safe data structures is that in addition to providing the same safety properties as sequental ones, they may also have to observe some liveness properties as well as additional safety properties specific to concurrent programming, like deadlock-freedom.

In addition to the expected safety properties, the main properties we want to test for are:

  • linearisability
  • lock-freedom for all the lock-free data structures
  • no potentially harmful data races

Here is a list of the tools we use to ensure them:

  • safety : unitary tests and qcheck tests check semantics and expected behaviors with one and more domains.
  • safety and liveness : STM tests check linearisability for two domains (see multicoretests library).
  • liveness : dscheck checks non-blocking property for as many domains as wanted (for two domains most of the time). See dscheck.
  • safety : no data race with tsan

See test/README.md for more details.

Benchmarks

There are a number of benchmarks in bench directory. You can run them with make bench. See bench/README.md for more details.

Contributing

Contributions are appreciated! If you intend to add a new data structure, please read this before.