Possible data races in Task.parallel_scan
OlivierNicole opened this issue · 8 comments
Once again I report a ThreadSanitizer data race warning. I haven't had time to flesh it out yet, but I intend to look at it more closely soon.
How to reproduce
Pin Domainslib to revision b8de1f71
. Pin qcheck to revision f09f42d0
. Switch to the ThreadSanitizer-instrumenting compiler and build and execute the following test (from https://github.com/jmid/multicoretests/):
open QCheck
open Domainslib
let test_parallel_scan =
Test.make ~name:"test Task.parallel_scan" ~count:1000
(pair (int_bound 10) small_nat)
(fun (num_additional_domains,array_size) ->
let pool = Task.setup_pool ~num_additional_domains () in
let a = Task.run pool (fun () -> Task.parallel_scan pool (+) (Array.make array_size 1)) in
Task.teardown_pool pool;
a = Array.init array_size (fun i -> i + 1))
;;
QCheck_runner.run_tests_main [ test_parallel_scan ]
You get the following two reports, both in Task.parallel_scan
:
- https://gist.github.com/OlivierNicole/1459b36f3851d9b311dbddfe4361cf7f
- https://gist.github.com/OlivierNicole/922a70c6655d7533ef21d6a5790efaad
Note that:
- The line numbers are always a little off; the code address offset reported for the item #0 of backtraces is always off by one; here, it seems that the possibly racy accesses are the writes in
prefix_s
. - A current limitation of the the ThreadSanitizer instrumentation is that backtrace information is not recorded in OCaml code, so the “previous” backtraces are probably incomplete and may be incorrect.
The first race is between:
Line 214 in 3a74279
and the initializing write at
Line 210 in 3a74279
so, according to the memory model, not a real race, as there is supposed to be a synchronization between an initializing write and the sharing of a value, according to the PLDI'18 paper. I don't understand yet why TSan didn't see this synchronization, though.
The first race is between:
Line 214 in 3a74279
and the initializing write at
Line 210 in 3a74279
so, according to the memory model, not a real race, as there is supposed to be a synchronization between an initializing write and the sharing of a value, according to the PLDI'18 paper. I don't understand yet why TSan didn't see this synchronization, though.
It might be worthwhile to check whether the Array.copy
primitive has the necessary synchronisation. A release fence.
The copying is performed in caml_array_gather
; in the case of small arrays (which is the case in this test), the array is allocated in the minor heap and a plain memcpy is used.
https://github.com/ocaml/ocaml/blob/50b09531f658c1ebe502cb566dee52c3e32d3c61/runtime/array.c#L448-L460
Maybe some code paths do not go through the fence when that array is promoted to the major heap. (Is it always promoted?) I don't know where the code that handles this is in the runtime.
There is in fact no question of promotion, as it was pointed out to me that it is no longer necessary in the stop-the-world minor collection design that was retained. We still need a fence, though, and indeed there isn't one in caml_array_gather
(used by Array.copy
, append
and concat
).
I'm taking the liberty of pasting here an explanation by @kayceesrk from a discussion on Slack:
Generally, there are only two ways an object can be shared with another domain.
- Through the closure that is used in Domain.spawn
- Assigning to a shared reference. This goes through caml_modify which has the release fence.
Since caml_initialize will precede a caml_modify or Domain.spawn, which has the necessary fence, caml_initialize doesn’t need the fence.
Looking closer, the race here is between the initializing write (a plain store) in Array.copy
and a read in caml_modify
. Indeed, before doing a store, caml_modify
does a plain load (argument *fp
):
https://github.com/ocaml/ocaml/blob/357b42accc160c699219575ab8b952be9594e1d9/runtime/memory.c#L151
and that is done before the fence.
From my (not very experienced) point of view this one is technically a real race… maybe @sadiqj, @maranget or @damiendoligez have an opinion about it?
No longer having this race report from the latest versions of ocaml-tsan, I’ll blame it on an insufficient instrumentation of the runtime back then, and close this.