Semantics of bulk memory operations
Opened this issue · 5 comments
The semantics of bulk memory operations on shared linear memory seems to be currently underspecified. If the memory model is mostly the same as JS's, I assume these should be unordered atomic.
AFAICT V8's current implementation is also unsafe in that it just calls out to C memmove
and memcpy
under the hood, which are non-atomic.
@conrad-watt should know best.
Currently the specification is just unordered byte writes (it would be equivalent to have one big tearing unordered write). This is a deliberate choice which means that any racing read may independently non-deterministically see an arbitrary mixture of "before" bytes and "after" bytes, effectively blessing [almost?] any reasonable implementation as conformant (including optimised strategies involving vectors etc). Because we have a full eager bounds check on the entire range of the bulk instruction before any writes take place, this non-determinism is only visible to racing reads (analogy - C undefined behaviour), not to any correctly synchronized code, or any code that synchronously inspects the memory after an OOB trap. We did previously agree that if any future feature is able to interrupt a bulk instruction midway after the initial bounds check (e.g. dynamic page write protection), the specified result would be a memory state where each byte write of the bulk instruction may have independently non-deterministically happened or not happened - we haven't had to put this into practice yet!
It's true that calling out to a fully abstract C memmove
is technically C-level UB, but so long as some basic reasoning is done about the underlying implementation (and my potentially stale understanding was that V8 controls its memmove
implementation rather than delegating straight to the system), things should be fine. Delegating Wasm memory accesses to C libraries in general needs to be done with care even in single-threaded code because of different UB expectations around aliasing etc.
Currently the specification is just unordered byte writes
Thanks for clarifying.
It's true that calling out to a fully abstract C memmove is technically C-level UB, but so long as some basic reasoning is done about the underlying implementation (and my potentially stale understanding was that V8 controls its memmove implementation rather than delegating straight to the system), things should be fine. Delegating Wasm memory accesses to C libraries in general needs to be done with care even in single-threaded code because of different UB expectations around aliasing etc.
So the intended implementer guidance is "reason at the level of the ISA"?
There's currently a discrepancy between SharedArrayBuffer
builtins and bulk memory operations in V8. For SAB builtins that do the moral equivalent of bulk memory operations (like set
), there are manual relaxed equivalents (e.g. Relaxed_Memmove
). This I think was done somewhat out of an abundance of caution but also done to quell TSAN. The Wasm equivalent just calls out to std::memmove
, which gave me pause.
So the intended implementer guidance is "reason at the level of the ISA"?
Yes, for maximum performance. Relaxed_Memmove
would also be a conformant but potentially less ambitious way to implement bulk instructions. I would potentially be a little suspicious if std::memmove
is literally going directly to the user's dynamically-linked system library - but only because it's permitted by the C/C++ spec in theory for such a library to be "maliciously conformant" - in practice it would be hard for a natural implementation to do something that violates the current Wasm spec.
To me it looks like Relaxed_Memmove itself contains an UB from C++ point of view.
Does the C++ standard guaranty that atomic and T are the same thing and can be casted to each other like this?
reinterpret_cast<volatile AtomicWord*>(dst)
Does the C++ standard guaranty that atomic and T are the same thing and can be casted to each other like this?
Yep, good eye. This is UB, though V8 does depend on it everywhere. I think since C++20's atomic_ref
there's at least a compliant feature to transition to.
(In my mind, and this isn't a very strong argument, this is like how everyone depends on type punning.)