sosy-lab/sv-benchmarks

Benchmarks for weak memory models

hernanponcedeleon opened this issue · 3 comments

This issue is to keep track of discussions about adding new benchmarks to the repository exposing weak memory behaviours and the creation of a new SVCOMP category for weak memory models.

On 25/04/2021 we had a meeting to discuss the step forward. The following people participated:

  • Dirk Beyer
  • Zsófia Ádám
  • Levente Bajczi
  • Matthias Heizmann
  • Dominik Klumpp
  • Henrich Lauko
  • Hernan Ponce de Leon

In that meeting we agreed on the following:

  1. we will initially focus our efforts in x86
  2. we will use C11 atomic library for the benchmarks

Topics that are still open for discussion:

  1. we need to agree on which formalisation of x86 we will use. The agreement is that we should have both an axiomatic and operational semantics definitions which have been shown equivalents: this paper seems to give us what we want.
  2. we need to decide how to handle hardware specific instructions (e.g. fences and compare-and-swap). Atomics builtins might be an option. There was another (more general) proposal using bell extensions.
  3. witnesses: we need witness validators. MetaVal was proposed as an alternative: it requires coming up with a program transformation. I'm working on support for witness validation in Dartagnan. It will probably also handles wmm but there are some things about witnesses that requires some discussion.

@leventeBajczi I took a look to the bell extensions but it is not very clear to me how we can use this to define hardware fences at the C level.

For example, this is part of the Linux Kernel memory model bell file

enum Accesses = 'once (*READ_ONCE,WRITE_ONCE,ACCESS_ONCE*) ||
		'release (*smp_store_release*) ||
		'acquire (*smp_load_acquire*) ||
		'noreturn (* R of non-return RMW *)
instructions R[{'once,'acquire,'noreturn}]
instructions W[{'once,'release}]
instructions RMW[{'once,'acquire,'release}]

enum Barriers = 'wmb (*smp_wmb*) ||
		'rmb (*smp_rmb*) ||
		'mb (*smp_mb*) ||
		'rcu-lock (*rcu_read_lock*)  ||
		'rcu-unlock (*rcu_read_unlock*) ||
		'sync-rcu (*synchronize_rcu*) ||
		'before-atomic (*smp_mb__before_atomic*) ||
		'after-atomic (*smp_mb__after_atomic*) ||
		'after-spinlock (*smp_mb__after_spinlock*)
instructions F[Barriers]

My understanding for this is that e.g. acquire is now a new instruction that can have either read or read-modify-write semantics (I don't know how you differentiate between the two). Similarly, rcu-lock is a new type of fence.

While I kind of see how this fits in the whole CAT framework, I'm not sure how we could use this for svcomp and C files. Can you please elaborate what you had in mind?

I might be wrong, but I think for example R[once], W[release], F[wmb], etc will be the new instructions. My idea was to use this specification to create specific instructions (or events) which can be used in the CAT specification, should we choose to formalize the memory model in that language. Then, macros or functions can be defined to use these instructions.

For example, let's say we want to create a lightweight and a heavyweight fence instruction, which can be referred to as F[light] and F[heavy]. In the memory models, a specific semantics can be assigned to these events (different per model, if it is necessary), and in the C files they can appear as __F_light() and __F_heavy().

Without such a bell file, we could only use instructions that appear in a specific architecture (but I might be mistaken). This way, we could achieve two things:

  1. Have the complete freedom to create new instructions types, if a new memory model needs it, without being constrained to a specific ISA
  2. Have a formal specification of all the instructions that can appear in any benchmark or memory model

Of course, the usefulness of this approach depends on the use of the cat specification language for memory models - if another language is used, this complicates things unnecessarily.

I had a chat with one of my colleagues yesterday and I think now I have a better understanding of how to interpret these bell files. This is how I currently understand it (@ThomasHaas feel free to correct me if I'm wrong).

The CAT language only has 4 different types of (abstract) instructions: R (reads), W (writes), RMW (read-modify-write) and F (fences). By allowing tags (those defined in an enum type) it allows to create more concrete instructions. E.g.

instructions R[{'once,'acquire,'noreturn}]

tells us that we have a READ_ONCE instructions which is a read and has the tag once. Similarly, smp_load_acquire is a read instruction with tag acquire. It is the responsibility of the tool developer to do the correct matching between the C instruction (e.g. smp_load_acquire) and the tag (acquire). I think one way of having this documented and standardised is using the comment in the enum type like

enum Accesses = 'once (*READ_ONCE,WRITE_ONCE,ACCESS_ONCE*) || ...

The semantics of new instructions are covered by the CAT file. E.g. for TSO-x86 we have

let mfence = po & (_ * MFENCE) ; po

and I assume MFENCE is defined as a fence tag in some bell file (I could not find this anywhere, but it might be because mfence is a basic/standard fence)

enum x86-Barriers = 'MFENCE
instructions F[x86-Barriers]

All the above means that we will have an mfence relation between two events in program order (po) if there is a fence instruction tagged with MFENCE between them. Then the CAT file of x86-TSO can make use of this newly defined relation.

I think we can have an agreement to use extern C functions with names __EVENT_TAG() where EVENT in {R, W, RWM, F} and TAG should defined as a member of an enumeration type tagging that kind of events, i.e. the following should be part of the bell file

enum myTag = ... | TAG | ...
EVENT[myTag]

For x86-TSO we would have a "C instruction" (actually a function) __F_MFENCE() which we would interpret a hardware barrier for x86 architectures, meaning that if we have the following C code

int a = x;
__F_MFENCE();
x = 1;

we know that instructions a=x and x=1 are ordered via the mfence relation.