Read only memories?
icefoxen opened this issue · 11 comments
Pardon my naivety, but one of the slightly weirder parts of wasm, from a system programmer's point of view, is the lack of separate data/rodata/bss segments. In particular I'm used to the OS being able to ensure data is read-only, and that trying to modify it will be caught. This is useful both for correctness but more importantly for security.
Has any thought been given to being able to mark memory's read-only, so they cannot be altered after initialization? This proposal seems like it is currently the best place to at least ponder the idea.
@lars-t-hansen brought it up in this issue, but there hasn't been any progress since. It still seems like a good idea to me.
The question is timely: https://www.usenix.org/conference/usenixsecurity20/presentation/lehmann
Note that it would be very difficult for us to make use of read-only memories in C/C++/Rust. The compiler has to statically know which pointers refer to which memories, but these languages do not distinguish read-only memory in their type systems (e.g. a const char *
may or may not be in read-only memory), so the compiler would have to continue placing all objects in a single read-write memory as it does today. The alternative would be to use fat pointers and turn every load and store into a switch on the memory index, which would be very expensive.
It's possible that this functionality could be exposed via language extensions or in more exotic LLVM-based languages, but it would not be able to be used efficiently by existing code.
Note that all of the above applies to multiple memories more generally and is not specific to read-only memories. The difference is that multiple memories are still useful because they allow shared-nothing module linking, which is not possible today. I don't see any way to make read-only memories generally useful right now.
The question is timely: https://www.usenix.org/conference/usenixsecurity20/presentation/lehmann
Indeed, it was that paper that made me return to this issue in the first place, though I'd bumped into it myself in my own compiler work play and thought "huh, that's interesting".
The fact that it would make it impossible to transparently interchange a pointer to a read-only memory with a read-write one is a good point I hadn't thought of. For example if you pass a pointer to a function that reads from it, it has to know whether to use it to load from the read-only memory or from the read/write "main" memory. But... it has to be able to figure that out to use multiple memories at all, I would think? Another security mitigation would be to put stack and heap in separate memories, or even divvy it down further by doing things like putting "internal to this module" and "exposed to the outside world" in different memories. Again, C/Rust don't distinguish "pointer on the stack" from "pointer on the heap" and so would have to be able to either handle both or neither. To make multiple memories useful at all one would probably need fat pointers, and I'd think an implementation wouldn't switch on the memory index but rather just use to index into a lookup table. An extra layer of indirection, but not THAT expensive. (A JIT could, naturally, specialize multiple copies of a function to access different memories without the extra indirection if it tried hard enough.) (Ooh, or a wasm implementation could replace all memory object indices with true memory addresses at compile time, and then your lookup table vanishes and you just have a base+offset load!)
Maybe it would be more useful to be able to designate ranges of an existing memory as read-only, and trap if they are written to? Continuing with that idea ends up converging on something like an operating system's page tables, which... may or may not be what one wants for Webassembly. It would suddenly turn a very simple system into a very complex one. But that's how current hardware and software solves the problem, and underlies several of the security guard features that are outlined in the Usenix paper.
Maybe it would be more useful to be able to designate ranges of an existing memory as read-only, and trap if they are written to? Continuing with that idea ends up converging on something like an operating system's page tables, which... may or may not be what one wants for Webassembly. It would suddenly turn a very simple system into a very complex one. But that's how current hardware and software solves the problem, and underlies several of the security guard features that are outlined in the Usenix paper.
This has come up before. Adding a mprotect
-like instruction might be a better solution here. Two main concerns I can think of are how it interacts with JavaScript and with multiple threads.
One of the currently used WASM applications is sandboxing user code. With the current model, it's impossible to give a WASM module read-only access to some memory (possibly belonging to another module, or to yourself), which limits what you can sandbox. mprotect won't solve the issue because it will make the memory globally read-only.
having multiple memories with some of them read-only sounds like it'd require tagged pointers (aka (memory, address)
pairs)... (those can be effectively treated as 64-bit pointers tho, especially if you can allocate arbitrarily many of them sequentially, so eh.)
our use-case for having read-only in wasm would be to AOT-compile wasm into an embedded platform, with actual ROM. this would also require relocatable wasm, ofc, since the embedded platform's address space is far from linear, but there's no reason it should require multiple memories - just having a standardized way of marking certain regions as going in ROM vs having to be copied to RAM would be good enough.
it's best to make these things standard so multiple compilers from multiple vendors can easily target them. we can probably deal with tagged pointers (they don't seem too hard to optimize out, but oh we may well turn out to be wrong about this... ofc, our use-case involves 32-bit embedded platforms, so it might be harder than it looks), if it comes to it. ah well.
As long as we're talking about entire memories: Each memory currently has a bounds check limit or uses some kind of page protection trick. A memory could have two limits, one for reading (the true limit) and one for writing (the true limit for r/w memories and zero for r/o memories); or if page protection is used, the memory would be write-protected. This would not be completely free since the additional limit would need to be loaded and/or would occupy a register from time to time; and bounds check elimination optimizations would be less effective because reads would not imply anything about the bounds of the writes and vice versa.
having multiple memories with some of them read-only sounds like it'd require tagged pointers...
Depends how you implement them. You can ask the OS to mark particular memory regions as read-only for you, then trap if you try to write to them. The wasm runtime catches the trap and handles it. For preventing writes to actual ROM... yes, you'd probably need to do the accounting yourself.
Cool idea
I'm closing this for now on this repo. I think it is an important discussion, but it is a complex problem that would clearly have to be a separate proposal.