KhronosGroup/Vulkan-MemoryModel

consistent[X] is too strict

raphlinus opened this issue · 3 comments

The current formulation of consistent[X] rejects a class of execution traces involving nonatomic races. This, I believe, prevents clean expression of “failing” litmus test cases, and prevents correct interpretation of one program I’ve found. There is a simple fix, which behaves much more like I’d expect and has manageable impact on the existing test suite.

I’ll develop my argument through three simple programs. The first two represent trivial fail and pass cases, respectively, and the third is one that distinguishes memory models. It cannot be expressed in C11, it is accepted in PTX, and, I believe, is rejected by the Vulkan Memory Model, though capturing that in a test is tricky.

Trivial fail: nonatomic race

This is a case where one thread writes a value and another reads it. This is obviously a data race and should be rejected, but expressing that in a test is subtle.

NEWTHREAD st.scopedev.sc0 x = 1 NEWTHREAD ld.scopedev.sc0 x

There are two executions, one where rf is empty (so #RFINIT = 1), and the other where rf is a single edge, let’s call it W1 -> R1. The first is fine - it is consistent because rf is empty, and dr contains R1 <-> W1 because there is no locord relation.

However, the second is problematic. Here, consistent[X] fails because rf is not contained in imm([W]; locord); the latter is empty because there are no locord relations. I believe this is wrong; this execution should be considered consistent, but should of course still be considered a failure because of the data race.

Note this program can be expressed and analyzed in terms of the Wickerson et al 2017 formalization of the C11 memory model. There, the second execution is also not considered consistent (see (30), the second line of consistent_{C11}, shown in Figure 6). I conjecture that this is a minor flaw, because while this formalization does not reject this specific execution as racy, it still correctly rejects the program because of the first execution.

Trivial success: atomic order

Note: this test is the first three threads of asmo.test.

NEWTHREAD st.atom.scopedev.sc0 x = 1 NEWTHREAD st.atom.scopedev.sc0 x = 2 NEWTHREAD ld.atom.scopedev.sc0 x = 1 ld.atom.scopedev.sc0 x = 2

Again, there are two executions, in this case corresponding to two values of asmo: W1 -> W2 and W2->W1. The first is consistent, the second is inconsistent, not because of anything that went wrong, but merely because it doesn’t match the observation in the third thread.

The lesson from this example is that the existence of an inconsistent solution is not a problem. This should be considered a trivially passing test.

Tricky case: mixed atomics and nonatomics

This one is a little more complicated to express, because there is control flow in the source program. Is it a data race or not?

thread1() {
    atomicStore(x, 1);
}

thread2() {
    if (atomicLoad(x) == 1) {
        read(x);
    }
}

This maps to two traces, depending on the predicate. First, false:

NEWTHREAD
st.atom.scopedev.sc0 x = 1
NEWTHREAD
ld.atom.scopedev.sc0 x = 0

This is consistent and has no data races. The nonatomic load is not invoked because the predicate is false.

Now the trace where the predicate is true:

NEWTHREAD
st.atom.scopedev.sc0 x = 1
NEWTHREAD
ld.atom.scopedev.sc0 x = 1
ld.scopedev.sc0 x

There are two executions, corresponding to the nonatomic read being RFINIT, or reading from the atomic write. In the current state, neither is consistent, for different reasons. The first execution is caught by the acyclic test; there is a from-read (fr) from the nonatomic read to the write, which forms a cycle with the read-from (rf) the atomic write to the atomic read and the locord form the atomic read to the nonatomic read. This is arguably correctly classified as inconsistent, though it’s certainly possible to imagine a sufficiently weak implementation that the nonatomic read may read 0. The second execution is (incorrectly) classified as inconsistent for the same reason as the plain data race above.

This program cannot be expressed in the C11 memory model because it mixes atomic and nonatomic access to the same location. Therefore it is not meaningful to ask whether the Wickerson formalization will accept or reject it.

It is accepted by the PTX memory model (it is basically the CoRR test in Figure 9a, which guarantees the nonatomic load will see 1 if the preceding atomic load did).

How to interpret tests

A generic passing test (mp.test is a good example) should be:

SATISFIABLE consistent[X] && #dr = 0
NOSOLUTION consistent[X] && #dr > 0

The first line is the same as the current state. Informally, it says, “there exists a consistent execution with no races.” The second line means, “all consistent executions are race-free.” Note that it doesn’t invoke #RFINIT as in the current formulation of the test. I consider that a workaround rather than a principled expression of the intent of the test.

There are two notions of failing tests. One constrains the implementation (asmo.test is an example) and is just NOSOLUTION consistent[X]. This is fine. The other constrains valid programs (eg test16.test) and needs more work. I believe it should read:

SATISFIABLE consistent[X] && #dr > 0

If there are multiple executions, some of which are racy, some not, I think it should be considered a failing test, therefore I don’t believe there’s any need for a second NOSOLUTION line.

The fix

I believe the second clause in consistent[X] should be changed to:

  no (X.rf . (stor[X.R - X.A])) & (stor[X.W] . (X.locord) . ^(stor[X.W] . (X. locord)))

It might be worth defining a helper function for z.^z, meaning transitive chains of at least two, which would simplify this to:

  no (X.rf . (stor[X.R - X.A])) & two_or_more[stor[X.W] . (X.locord)]

I’ll try to explain the change informally. The current state (expanding the definition of imm) says “an atomic read is from a write ordered by locord, and is not shadowed by another write ordered by locord.” My proposed change drops the first conjunct and keeps the second. For single-threaded programs it is the same, as all (nonatomic) memory events participate in locord, but for racy multithreaded cases, it expands the set of consistent executions considerably to include those races.

For reference of how PTX analyzes nonatomic consistency in single-threaded programs, see Axiom 5 (section 3.5.5 in the paper), where it is captured by an acyclic condition on rf, fr, and program order. A read from a shadowed write in a single-threaded context (forbidden by my proposed change) would induce such a cycle, so one direction of an implication is established. Whether there are other cycles that violate causality that would not be caught by my proposed rule is an open question. If not, then in practice the set of executions considered consistent in Vulkan and PTX should be equivalent (modulo of course the differences in programs that can be expressed).

Summary

The Wickerson formalization on which the Vulkan Memory model is based has a minor flaw of misclassifying some racy executions as inconsistent. It's not clear this is a serious problem, as all cases I've looked at correctly classify the program as consistent but racy. However, when mixing atomics and nonatomics that assumption breaks down, and I've given an example of a program which I believe is misclassified (all consistent executions are race-free, but the racy executions are classified as inconsistent).

A relatively small change to the text fixes this, classifying this sort of racy program as consistent. Thus, the "pass" and "fail" conditions for tests can be written more cleanly, in particular not invoking RFINIT. I believe it affects the formalization and not the spec text, though it may be that on review there is some language that could be clarified.

I'm happy to provide more detailed litmus tests and Alloy graphs if they're needed, though I hope this presentation is adequate.

References

Wickerson et al 2017: Automatically Comparing Memory Consistency Models

PTX: A Formal Analysis of the NVIDIA PTX Memory Consistency Model

Pinging @alan-baker and @RobinMorisset for tracking purposes.

@raphlinus if you have the changes ready, do you mind making a pull request? If not, I can make it.

I'm happy to PR the change to the Alloy file, but I don't have one ready with the test changes. If you've got that, feel free.