[RFC] enclave-cc: an approach of implementing process-based Confidential Containers
Closed this issue · 36 comments
Motivation
The term "enclave-cc" is initially mentioned in TrustModel of CC. The typical process-based confidential computing hardware TEEs include Intel SGX and RISV-V Keystone. They all call hardware TEEs as enclaves.
"enclave-cc" needs to follow up current CC design principle and trust/threat model, using existing components to implement a general process-based Confidential Containers.
Goal
Minimize the changes of UX
Current CC design principle tends to utilize and reuse ecosystem to implement Confidential Containers.
Unfortunately, the existing SGX LibOS implementations, e.g, Gramine Shielded Containers and Occlum, require extra steps to wrap LibOS runtimes into container image during image creation.
Theoretically, LibOS runtimes are not parts of container workload, so the wrapping operation can be removed during image creation.
In addition, removing wrapping operation in image creation can facilitate the development of standardization of implementing process-based Confidential Containers, because the workload protections require standard implementions already used in container ecosystem, such as image encryption and signing.
Reuse Confidential Containers project
From a high level design, CC requires:
- Container image is always protected, e.g, signed and/or encrypted.
- Image pulling offloaded to TEE for the confidentiality and integrity protections to the workload.
- Prove that sensitive workloads are running on a genuine and trusted hardware TEE.
- The non-workload components running inside TEE are not mandatory to be provided by image creator.
An approach of implementing process-based Confidential Containers, called enclave-cc, strictly follows up above design principles.
where:
- Inclavare Containers Project is a container runtime to launch and attest process-based confidential container for Intel SGX[1,2] and other Enclave Runtimes[3,4,5] interacting with the general PAL APIs. It was introduced[6] in Kata CC weekly meeting.
- LibOS generally refers to the most SGX LibOS implementations, such as Gramine, Occlum, LKL-SGX and so on. The enclave-cc needs to have a general programming model to support various Enclave Runtimes.
Architecture of enclave-cc
enclave-cc introduces the so-called "stub enclave" to host the general image mgmt and attestation functions in enclave-agent similar to kata-agent. It is hosted by a LibOS and protected by running hardware TEEs, such as Intel SGX Enclave. "stub enclave" is authenticated by attestation service by relying party
FUSE encryption filesystem acts as a temporary storage, providing the protections for the metadata and container bundle. There are many alternatives.
The LibOS running inside App Enclave needs to establish an attested channel to retrieve the decryption key to mount FUSE encryption filesystem. The attested channel can be established in several ways according to policy:
- launch a local attestation with stub enclave
- delegate stub enclave to proxy a remote attestation (not drawn)
This proposal is still an early draft. Any feedbacks are welcome!
References
- [1] Occlum: https://github.com/occlum/occlum/blob/master/docs/rune_quick_start.md
- [2] Graphene: https://github.com/intel/GrapheneSGX-Golang-Support-and-Enhancement/tree/rune_v1.1
- [3] AWS Nitro Enclaves: https://github.com/alibaba/inclavare-containers/tree/master/rune/libenclave/internal/runtime/pal/nitro_enclaves
- [4] kvmtool: https://github.com/alibaba/inclavare-containers/tree/master/rune/libenclave/internal/runtime/pal/kvmtool
- [5] WebAssembly Micro Runtime (WAMR): https://github.com/bytecodealliance/wasm-micro-runtime/tree/main/product-mini/platforms/linux-sgx/enclave-sample/App#wamr-as-an-enclave-runtime-for-rune
- [6] Recording link: https://zoom.us/rec/share/byLXuYK590ZfTRIJFlZvOYgSzBRAuWt_ZcAsduzwGg_DnhfiDh4ZQBxyYis6LNWU.UEn0P62MJYRzY2IG
This is a very good start. I think the approach with the stub enclave makes a lot of sense.
Theoretically, LibOS runtimes are not parts of container workload, so the wrapping operation can be removed during image creation.
I agree that needing to wrap/pre-process a container is a big limitation. How do you propose to remove this requirement?
Are you concerned about hardware limitations regarding multi-process support and context switch overhead? Are these barriers to cloud-native deployments?
cc: @dcmiddle in case you haven't seen.
thanks @fitzthum
Yes I think this looks like a promising direction. It seems to change the libos model from linking an application into the libos, to instead using the libos as a runtime or loader.
The original Gramine-based demo uses a tool (GSC) at container image build time. In that model the user is making an a priori decision to use a container as an enclave and only as an enclave.
This proposal targets a usage model where the decision to use a TEE is at deployment time. One could for instance grab any image out of a registry and target the deployment at a TEE. In this model there are one or more helper enclaves that will handle getting the image into enclave memory.
This removes the "wrapping" previously done at build. Instead there's a general purpose enclave already running waiting to load the image into its protected memory. Then like the usual libos pattern it will call the container's entry point as though the container was a process unto itself (rather than where it now sits like a library in the libos process).
One of the trade-offs here is that in the first case (build time) the enclave's identity is the container, i.e. the end user could get a HW-based attestation of the container and use that for an app-specific purpose. In this proposal, the attestation would be of the stub enclave out to the KBS and maybe there is no remote attestation of the app enclave. This is probably more similar to the existing ccv0/1 use case(s) where an empty VM is attesting that it can securely load a container.
Gramine isn't currently used like this, so we need to get some gramine expertise on this to make sure we don't lose important security aspects (c.f. manifest). That is probably best done through the gramine community rather than thru this issue. I would probably be helpful to the Gramine community to hear the Occlum experience in adding this mode.
From @fitzthum
I think the approach with the stub enclave makes a lot of sense.
my concern is that this approach mixes the image bundle processing and runtime. Is it possible for a container (stub enclave) in a pod to process the bundle for another container in the pod? as far as I've understood rune depends on the original image bundle and then it just modifies it before running the container. in this proposal, how containerd gets the information that the app enclave is already pulled, decrypted and wrapped before it tries to pull?
I agree that needing to wrap/pre-process a container is a big limitation. How do you propose to remove this requirement?
In this proposal the wrapping is done by runtime so it's not removed. So the question becomes is it better to choose the libOS offline and do it before the image is pushed to a registry or select it deployment time and rely on the runtime to do it.
From @dcmiddle
The original Gramine-based demo uses a tool (GSC) at container image build time.
perhaps worth adding that the image contained encrypted elements too which were decrypted runtime. This could be changed to use KBS and other components.
my concern is that this approach mixes the image bundle processing and runtime. Is it possible for a container (stub enclave) in a pod to process the bundle for another container in the pod? as far as I've understood rune depends on the original image bundle and then it just modifies it before running the container. in this proposal, how containerd gets the information that the app enclave is already pulled, decrypted and wrapped before it tries to pull?
Due to the fact that stub-enclave is launched prior to the launch of the first app image, so it is launched by other approach instead of using the regular image pulling flow.
In this proposal the wrapping is done by runtime so it's not removed. So the question becomes is it better to choose the libOS offline and do it before the image is pushed to a registry or select it deployment time and rely on the runtime to do it.
Wrapping is required for Gramine because it has to do it, e.g, integrating user-defined manifest. Essentially it is used to provide the protections for workload. However, this approach might be a burden for the UX of confidential containers. FUSE-based fs is a more general approach for storage protection, and image protection can be re-used from CC approaches. Is there a possible to use these more general approaches for Gramine? I think filesystem protection is better than manifest approach. Or alternately we attempt to find a way to see how to keep manifest approach in enclave-cc
so it is launched by other approach instead of using the regular image pulling flow.
OK. I don't recall the exact order of when PullImage
for the app image happens but it sounds my question is not an issue.
I think filesystem protection is better than manifest approach.
Given the container build tools we have today, the fs approach easier for the user but I wouldn't necessarily say better. @dcmiddle 's comment above is a great summary about the differences and why the manifest approach is justified in some use cases.
I am the architect of Occlum libOS. I am very happy to see this design to load images with LibOS in SGX.
The current Occlum libOS supports multiple types of encrypted file system and multiprocessing. So Occlum is able to support this design even without any change. Occlum is able to build the FUSE encryption file system in one instance, and then mount it as root in another instance.
Furthermore, Occlum actually has two boot stages. In the first boot stage, the user could verify the boot image or add some files into the original one or even create a totally new one. So Occlum could get the image at the first boot stage, build the second stage root FS with the image, and then boot to the new FS immediately.
As the result, Occlum is able to support this feature very easy, since Occlum already has all the basic blocks: file system, multiprocessing, multi-stage booting and mount syscall.
In this proposal the wrapping is done by runtime so it's not removed. So the question becomes is it better to choose the libOS offline and do it before the image is pushed to a registry or select it deployment time and rely on the runtime to do it.
We've talked about running unmodified workloads as being one of goals of Confidential Containers. In other words users should be able to take encrypted containers that are already in their registries today and run them with CC. That isn't necessarily set in stone and in some ways it might not fit with how people might actually use CC, but it's worth noting.
@dcmiddle points out that the stub approach decouples the measurement of the workload from the measurement of the enclave. This is exactly what we've been aiming to do with Kata; provision a generic secure environment that is easy to measure and then deploy workloads into it.
One of the trade-offs here is that in the first case (build time) the enclave's identity is the container, i.e. the end user could get a HW-based attestation of the container and use that for an app-specific purpose. In this proposal, the attestation would be of the stub enclave out to the KBS and maybe there is no remote attestation of the app enclave. This is probably more similar to the existing ccv0/1 use case(s) where an empty VM is attesting that it can securely load a container.
This is a very crisp summary of the proposal, thanks @dcmiddle.
To me this approach makes sense for mainly 2 reasons:
- It is aligned with the overall architecture of the project. We use the TEE as a the secure, attested enclave where we can download and unpack confidential container images for the actual workload to run. The attestation agent and the image crates would be the common SW pieces here.
- It provides a consistent Kubernetes user experience across TEEs, but also compared with the existing, non confidential Kubernetes usage patterns. What's really important to me here is for the container workload to be unmodified (but encrypted/signed) and identical across TEEs.
Gramine isn't currently used like this, so we need to get some gramine expertise on this to make sure we don't lose important security aspects (c.f. manifest).
That would be a very valuable input to get back from the Gramine community.
I would probably be helpful to the Gramine community to hear the Occlum experience in adding this mode.
Is that something we could organize? @jiazhang0 Would you be ready to initiate the discussion and present your proposal to the Gramine community, if that makes sense to you?
enclave-cc introduces the so-called "stub enclave" to host the general image mgmt and attestation functions in enclave-agent
similar to kata-agent. It is hosted by a LibOS and protected by running hardware TEEs, such as Intel SGX Enclave. "stub
enclave" is authenticated by attestation service by relying party
@jiazhang0 the "stub enclave" looks to be a fixed function container which can be created offline. if the stub enclave could also handle what inclavare's carrier fw does, it sounds this would also work with runc and kata-runtime, right?
enclave-cc introduces the so-called "stub enclave" to host the general image mgmt and attestation functions in enclave-agent
similar to kata-agent. It is hosted by a LibOS and protected by running hardware TEEs, such as Intel SGX Enclave. "stub
enclave" is authenticated by attestation service by relying partyif the stub enclave could also handle what inclavare's carrier fw does, it sounds this would also work with runc and kata-runtime, right?
Sorry what does "inclavare's carrier fw does" mean here?
Sorry what does "inclavare's carrier fw does" mean here?
"Carrier is a abstract framework to build an enclave for the specified enclave runtime (Occlum、Graphene ..) ."
My question was that can the "stub enclave" build an enclave for the specified enclave runtime.
Sorry what does "inclavare's carrier fw does" mean here?
"Carrier is a abstract framework to build an enclave for the specified enclave runtime (Occlum、Graphene ..) ."
My question was that can the "stub enclave" build an enclave for the specified enclave runtime.
Carrier is the concept we introduced in shim-rune to launch an automatic image format conversion from standard image format to libos specific format. So theoretically we can borrow this concept to stub enclave, and it can provide support for any libos we want to support.
But I prefer using a secure FUSE filesystem with encryption and integrity protections to store the content from the unpacked standard image, which means there is no libos specific conversion process occurring after unpacking image.
which means there is no libos specific conversion process occurring after unpacking image
The diagram shows libos for the app image. How it gets there?
which means there is no libos specific conversion process occurring after unpacking image
The diagram shows libos for the app image. How it gets there?
The libos is part of control plane. So it is not provided by user but CSP. Don't worry. Consider that TDX doesn't support enclave signature check but it is still proved the payload can securely run inside TD guest, so this idea can be re-used in sgx enclave. The libos is signed by CSP but it can be verified by attestation service (Not the one such as Intel PCCS/PCS. It is a service employed by relying party, such as verdictd, isecl and gop). The key part is only libos is launched along with building enclave with measurement calculation and signature check. The app payload from container image is dynamically loaded after enclave is launched.
@jiazhang0 thanks! a few more questions, partly triggered by my quick replay of [6] today.
The libos is part of control plane. So it is not provided by user but CSP
Can the workload owner bring their own stub enclave? btw, in the diagram the "app enclave" should probably be "stub enclave + app".
The recording talked about a security concern (at 1:11:20->) with Runelet. Is rune/runelet a hard depency in this proposal or you already have a way to mitigate that concern?
Based on what was discussed on multi-process support in the same recording, it means that the stub enclave that handles the image pulling and attestation stub enclave must use Occlum today, right?
What is also not clearly mentioned are changes needed to, e.g., containerd. Have you verified the flow on a vanilla containerd and CRI or this has the same image offload changes needed than the Kata CC work?
Can the workload owner bring their own stub enclave? btw, in the diagram the "app enclave" should probably be "stub enclave + app".
For the better UX, only app payload included in app container image is provided by the user. So stub enclave is provided by CSP as well. Similar to Kata CC, we cannot imagine kata-agent is provided by the user (not a perfect example).
The recording talked about a security concern (at 1:11:20->) with Runelet. Is rune/runelet a hard depency in this proposal or you already have a way to mitigate that concern?
This security concern is related to EPM (enclave caching mechanism). runelet doesn't hard depend on EPM and I don't plan to enable it in enclave-cc at least in this moment. So we don't need to concern about it now.
Based on what was discussed on multi-process support in the same recording, it means that the stub enclave that handles the image pulling and attestation stub enclave must use Occlum today, right?
No. The libos showed in the diagram is not limited to any specific libos. In fact, if this arch only cares about Occlum, the enclave-agent program can run inside app enclave together with app payloads because of the multiple-process support by Occlum. I intend to design stub enclave for the need of Gramine which doesn't support multiple-process.
What is also not clearly mentioned are changes needed to, e.g., containerd. Have you verified the flow on a vanilla containerd and CRI or this has the same image offload changes needed than the Kata CC work?
Yes. Enclave-CC needs the same modified containerd (cri-plugin) as kata-cc to work.
Thanks for your questions. I will address them in the formal arch doc.
If users want to leverage virtualized SGX in VMs provided by QEMU or Cloud hypervisor to protect the containers in the VMs, and they use kata-container to launch, then I think that that rune/runlet may not make sense for this case. As far as I know that rune replaces runc in Inclavare Containers and mostly is designed or bare metal case. In the first diagram, I think it is short of the case to use kata-runtime to startup confidential containers via SGX in VMs. We know, CC-V0 tries to address TDX cases through kata-runtime. So at least, I do not see this kind of design. From my mind, if the solution is complete, we should also design enclave-cc usage in VM scenario via Kata-runtime.
If users want to leverage virtualized SGX in VMs provided by QEMU or Cloud hypervisor to protect the containers in the VMs, and they use kata-container to launch, then I think that that rune/runlet may not make sense for this case. As far as I know that rune replaces runc in Inclavare Containers and mostly is designed or bare metal case. In the first diagram, I think it is short of the case to use kata-runtime to startup confidential containers via SGX in VMs. We know, CC-V0 tries to address TDX cases through kata-runtime. So at least, I do not see this kind of design. From my mind, if the solution is complete, we should also design enclave-cc usage in VM scenario via Kata-runtime.
Good point. Yes. That is what Kata already did, because SGX is a process-based TEE so its usage is pretty flexible. In that case, SGX is simply passed through to container inside Sandbox and the app/libos can use it. But that has the same problem that the deployed container image still has burden on the creation if using libos. So as you mentioned, enclave-cc can potentially deal with it in a slightly different way, e.g, integrating libenclave (similar to the function as libcontainer used by runc) and enclave-agent support inside Sandbox. As the initial proposal, I want to keep a base and simple design. In future, enclave-cc can evolve to support more usages, with new hardware support.
The recording talked about a security concern (at 1:11:20->) with Runelet. Is rune/runelet a hard depency in this proposal or >> you already have a way to mitigate that concern?
This security concern is related to EPM (enclave caching mechanism). runelet doesn't hard depend on EPM and I don't plan
to enable it in enclave-cc at least in this moment. So we don't need to concern about it now.
OK, so that concern can be mitigated by not using EPM but can you also comment on the dependency to rune/runelet in this enclave-cc proposal.
No. The libos showed in the diagram is not limited to any specific libos.
Great! Is this with the assumption that the libOS supports inclavare PAL? Are there some new PAL changes/functions needed to support enclave-cc?
Are there some new PAL changes/functions needed to support enclave-cc?
Good question, I was trying to dig into Step 6 in the diagram (attestation between the app and stub enclaves). @jiazhang0 Does Step 6 require an addition to Inclavare's PAL-API?
I guess that would be a good use for local attestation.
I think Step 7 might also be new?
Good question, I was trying to dig into Step 6 in the diagram (attestation between the app and stub enclaves). @jiazhang0 Does Step 6 require an addition to Inclavare's PAL-API?
I guess that would be a good use for local attestation.
Yes. Need to add new PAL API to assist exchanging local reports between stub enclave and app enclave.
I think Step 7 might also be new?
Probably. It depends on LibOS. But it is OK to add new API if necessary.
OK, so that concern can be mitigated by not using EPM but can you also comment on the dependency to rune/runelet in this enclave-cc proposal.
From the beginning of inclavare-containers, rune/runelet is designed to run application over SGX LibOS in container manner. Specifically, rune/runelet tries to address how to launch a LibOS and its payload in a LIBOS agnostic way. This is archived. However, we miss a piece of puzzle due to the lack of an unified and standard methodology to deploy the payload in a LIBOS agnostic way. Right now we are seeing CC project gives an perfect answer about the deployment of container image in a secured and standard method. So rune/runelet needs to be extended a bit to fully support enclave-cc.
@dcmiddle About the influence caused by fork() in Gramine, I think the below 2 figures have certain enlightenment.
The first figure show how rune/runelet supports "exec" command, which originally means launch an app inside a container in runc. For sgx enclave, rune maps this semantics to launching an app inside an enclave which is hosted by a container process.
As we see, Occlum supports multi-process (where liberpal-occlum.so is libos-specific component responsible for implementing PAL APIs). Essentially, runelet employs a linux thread to host each app inside enclave. This design ensures each app inside enclave and the corresponding linux thread keeps 1:1 mapping.
This is the figure showing how rune/runelet supports "exec" command for graphene.
In order to avoid forking the entire host runelet process, rune/runelet would employ a linux thread to host another instance of libos (and an enclave). This design still confirms to the principle of keeping 1:1 mapping between each app inside enclave and the corresponding linux thread. But I'm not sure whether Gordon from Intel implements it in this way.
Back to the use of folk() in Gramine, if an app inside app enclave calls fork(), it is better to do it in the same way as shown above, because it will reduce the complexity of enclave-cc implementation. If fork() is implemented to fork the entire host process, e.g, runelet, it is very complicated on the management for multiple runelet instances in a container. Instead, it is better to create a new linux thread to host the new instance of Gramine libos (and child app inside the corresponding enclave). In order to further reduce the complexity of enclave-cc, the fork() never happen in stub-enclave.
The first figure show how rune/runelet supports "exec" command, which originally means launch an app inside a container in
runc. For sgx enclave, rune maps this semantics to launching an app inside an enclave which is hosted by a container process.
@jiazhang0 how do you see this with regards to what's been discussed with Kata-CC: to limit API end points to prevent untrusted host to exec inside a trusted environment?
The first figure show how rune/runelet supports "exec" command, which originally means launch an app inside a container in
runc. For sgx enclave, rune maps this semantics to launching an app inside an enclave which is hosted by a container process.@jiazhang0 how do you see this with regards to what's been discussed with Kata-CC: to limit API end points to prevent untrusted host to exec inside a trusted environment?
According to current CC security principle, exec command should be limited by a policy which is provisioned by the payload owner through remote attestation. This would be a new feature for rune/runelet.
@mythi By the way, the example just shows how rune/runelet manages multiple LibOS instances and gives the recommendation on handling fork() called by app. Exec needs to be carefully handled in Enclave-CC.
I get the sense we have general consensus on the direction here. What we are now getting into are lower level design decisions and we might better document those and discuss them in separate issues by component. Or better yet as PR's in to a design document in this documentation repo.
Additionally while some components are in inclavare, occlum (and possibly gramine), others like the stub and application enclaves are new components and seem naturally housed in this project. I recommend that we create a new repo to start building those.
Additionally while some components are in inclavare, occlum (and possibly gramine), others like the stub and application enclaves are new components and seem naturally housed in this project. I recommend that we create a new repo to start building those.
I fully agree. In order to share that intention with the rest of the community I created confidential-containers/confidential-containers#6 to get a slightly more formal approval before creating it. Please comment on the issue there and we'll move forward and create the new repo.
from security perspective, I think we need to make sure when image decryption key is to be released to the stub enclave, the key is 1) wrapped with attested stub enclave wrapping key, and 2) the fuse file system key is generated inside the attested stub enclave, 3) the stub enclave will verify the app enclave.
I think step 6 may happen earlier before step 3 such that the app enclave is launched before key request is made, so that app enclave may be verified by the remote attestation service as well. so mutual attestation between the stub enclave and app enclave may not be needed? this will also avoid sequential execution to save launch time.
from security perspective, I think we need to make sure when image decryption key is to be released to the stub enclave, the key is 1) wrapped with attested stub enclave wrapping key, and 2) the fuse file system key is generated inside the attested stub enclave, 3) the stub enclave will verify the app enclave.
Yes. These contents should be emphasize. I will add them to the formal PR: confidential-containers/enclave-cc#1 Thanks!
I think step 6 may happen earlier before step 3 such that the app enclave is launched before key request is made, so that app enclave may be verified by the remote attestation service as well. so mutual attestation between the stub enclave and app enclave may not be needed? this will also avoid sequential execution to save launch time.
Step 6 depends on step 5, which happens after step 1. Step 1, 2, 5 are ordered from the perspective of K8s control plane, so moving step 6 after step 2 means app enclave
is launched at pullImage() step, implying there is no chance to retrieve the container annotations for app enclave, because it happens in CreateContainer() as step 5. In addition, the RA for app enclave
is slower than LA, so the launch time is actually increased.
The design initiated in this issue is now merged in the enclave-cc repo.
https://github.com/confidential-containers/enclave-cc/blob/main/docs/design.md
This RFC issue is nearing completion. I think the one remaining thing would be to get a PR into this docs repo describing the adopted scope and the enclave-cc repo.
After some reorganization of the .github readme layout and the docs in this repo I don't see an obvious place for this content anymore. For now I will close this issue and keep an eye out for a useful place to make this portion of the Confidential Containers functionality visible apart from the enclave-cc repo.