What's the RAM overhead of the CheriOS "software compartments"?
shafqatevo opened this issue · 15 comments
Hi, this is a great project. Just curious what's the typical memory overhead of the software compartments in terms of kilobytes/bytes?
Thanks!
This question may need some more context. Are you asking about the minimum cost of such compartments (that is, the smallest CheriOS compartment that can be constructed) or do you have some other baseline relative to which these compartments are "overhead"?
Yes, I meant the smallest compartment, which I assume will approximate the system’s overhead for each compartment instance.
For example, WebAssembly is now being used for trusted computing and the lowest overhead for each additional WebAssembly module instance is like 8KB (WAMR project).
Please note that I am only asking about overhead per compartment instance, not the CheriOS system’s (I guess the nanokernel’s) base memory requirement.
Hi @LawrenceEsswood or anyone else, any insight on this will be much helpful! Thanks!
Also, if you can share what is the typical launch latency of an enclave...
At the end of the day, all the different CheriOS compartmentalisation styles rely on sealing. Sealing is, for the most part, a zero-cost abstraction (assuming you already use CHERI). If you already have a CHERI capability representing pointers to your functions / data, sealing carries no extra overhead. Switching logic can be shared between compartments. The "minimum" overhead for a compartment is therefore nothing.
However, you do get extra state to avoid sharing. It depends a lot on what kind of "compartment" it is. If your compartment does not require its own separate stack, it stays pretty small. Generally, about two capabilities. One is used for a lock (for CFI), the second to unseal extra arguments. Examples of this actually happening are the nanokernel itself, and "Fast Leaf calls". Although compartment overhead for the fast leaf calls have a half a dozen extra capabilities allocated for implementation reasons.
If you need a separate stack, a little larger. With good optimisation, at least an extra page per compartment for page-based fragmentation reasons. It also increases stack frame size as extra registers are pushed (the Caller saved set) before calls. Default CheriOS behavior is to give each dynmic library its own stack, as well as the as a kernel stack for each thread.
If you want authentication of those compartments, this adds a sha256 hash plus a couple extra capabilities for a signature. On CheriOS these signatures are generally used at the granularity of processes (which may have extra compartmentalisation within).
There are also extra little overheads to ensure separation. To ensure global accesses go through a capability, a Captable (like a GOT) is need to access global symbols (this was already present anyway in a purecap CHERI build). But, you might imagine this is an extra capability per symbol required by a compartment. The heap on CheriOS is already designed to, by default, allow for each caller to be from a separate compartment and to ensure they do not get access to each other's allocations. At worst, if all malloc allocations are just one byte, the extra tracking gives something like a 20% overhead.
So if you want numbers, on a 64-bit platform I suppose anywhere between 32 bytes and maybe 5K. Not including code size changes some inability to inline across compartment boundaries.
That is super helpful and thanks for such a prompt reply! I have skimmed through your CheriOS thesis. Will read it thoroughly soon. Objective is to understand the approaches and to design a new language runtime on top.
Any ballpark number on the startup/launch latency of processes in CheriOS (let's say for just a Hello World program? Is it in microseconds or ms or nanoseconds? That does depend on specific hardware but any ballpark will be helpful or comparison with say Linux...
On QEMU? Significantly faster than BSD. BSD is seconds to minutes (depending on a amount of tracing required). CheriOS is at least an order of magnitude faster. Fast enough that I developed applications by just restarting the whole OS on each build cycle.
On QEMU? Significantly faster than BSD. BSD is seconds to minutes (depending on a amount of tracing required). CheriOS is at least an order of magnitude faster. Fast enough that I developed applications by just restarting the whole OS on each build cycle.
Wow! Great to know that...
Hi @LawrenceEsswood, a basic question - what's the memory overhead for just an integer in CHERI / CheriOS? Assuming additional overhead will be there due to capability-security? Or is it like a regular pointer with no additional overhead?
Thanks for all your clarifications - really appreciate it...
It depends what you mean by "Integer".
Do you mean the language level types? Such as "int" in a language like C/C++? The CHERI compiler does not change the size of most integer types. The exceptions to this are the types meant to hold the integer versions of pointers. e.g. "intptr_t" or "uintptr_t".
Of course, if you need a pointer to that integer (and you are compiling with pointers represented by capabilities, which is of course a choice), then the size of the reference doubles.
Thanks! I actually meant any primitive type for that matter: int, bool, char etc. I guess the first question is whether capability-security at all apply to that level in Cheri and if yes, what's the memory overhead to maintain the capability.
Basically trying to understand what's the memory impact of the entire capability stuff on basic primitive types, compartment and process level which you answered.
How does the "pointer represented by capabilities" relate to tagged pointer architecture of 60s - is it basically the same?
CHERI is a tagged pointer architecture, much inspired by those architectures. A capability has one integrity bit per either 128 bits (on a 64 bit machine), or 64 bits (on a 32-bit machine).
CHERI capability material is stored in the reference, not the target. It can protect individual bytes, the size of those bytes do not grow. However, you will need some way to refer to those bytes (i.e., a pointer). A pointer is not "just an integer". It is a language-level type that the compiler has a choice in how it lowers into hardware. The CHERI compiler we provide offers two options: lower into a flat address, or a capability. The second is more flexible, as it allows the pointer to be shared across compartments. The first requires finding another capability to actually dereference the address, but does not create extra overhead. It also possible to use a hybrid of the two, and we provide language-level annotations for programmers to use if they wish to manually control how pointers are lowered.
Wow - exactly as I would've imagined! Many thanks for the explanations.
What's the future plan/roadmap for CheriOS in general? Is there any reference regarding that? Ideally I'd imagine something like Cheri and compartments/enclaves replacing things like Xen, unikernels, containers, even ultimately container orchestration (once distributed/clustering features are incorporated).
Well, I am currently no longer actively working on it. Although (Job permitting) I hope to return to it one day.
CheriOS's direction, I imagine, is as a research platform, not a commercial product. I know of a few efforts from masters / PhD stduents who are currently working on it. CheriOS is simply not mature enough (and likely will never be without more hands on deck) to find itself in real systems. My hope is that some of the ideas from CheriOS will make their way into other CHERI based systems.
I am not sure I would ever place CHERI as the alternative to the layers of abstraction offered by hypervisors / containers / sandboxes etc, but rather the enforcement mechanism by which they are achieved. Currently, a ad-hoc combination of MPUs, MMUs, privilege rings, access control lists and compiler instrumentation achieve enforcement. I would like to see more and more of these modified to use a single, flexible enforcement mechanism. Maybe that mechanism is CHERI.
Fully understood the purpose. We're (an open source project team at very early stage) closely following the research at Cambridge and exploring ways to unify the capability-based security/enforcement approach with some sort of a universal runtime merging/unifying OS, language runtime, virtual machines and distributed process orchestration. We'll definitely take a lot of design inspiration from your and others work in this lab.