draperlaboratory/hope-policy-engine

Expanding `.dover_metadata`

Opened this issue · 2 comments

The Lincoln Labs team thinks it will be desirable in the long-term to be able to include arbitrary tags in the .dover_metadata ELF section.

Background

The existing semantics of the .dover_metadata section are explained in this
llvm header
and they are implemented in this python script.

Currently, tags here are specified by a fixed set of byte-sized integers
constants. The runtime that interprets this ELF section maps these integers back
into policy-specific strings. Thus we suggest simply storing the policy strings
in this section to begin with rather than creating a hard-coded integer/string
mapping that must be maintained in several places.

Design

More specifically, we propose a new operation, DMD_DECLARE_TAG. This
operationg followed by an integer (uleb128) then a null-terminated string
(asciiz) dynamically declares a mapping from integers to strings. This allows
the other operations to continue to identify tags based on integers unchanged.

We propose removing the original hard-coded mapping of 10 integers to specific
llvm.* policy tags in favor of using this new operation. However, this is
optional--the operation could be used to only declare mapping integers outside
the range already hard-coded.

This operation would be permitted anywhere in the metadata stream. However, use
of an tag integer before it has been declared would be an error.

We agree that the .dover_metadata ELF section should be able to
include arbitrary tags. In fact we used to support this, and the
reason tags are represented as integers is related. This section
used to have a totally different use case that I think it will be
instructive to explain, because we'll need to revive it soon. So
bear with me a little.

History: Dynamic Loading

Currently, all tag initialization happens at boot time. But
suppose you want to support dynamic loading of applications, as
in a general purpose OS like Linux or BSD. In this case, we
can't set up tag memory ahead of time, so the ELF file to be
loaded will need to carry some information about how its tag
memory should be set up. That information will need to be
communicated from the AP that is loading the program to the PEX.

Problem: There is no communication between the AP and the PEX, in
fact they can't even see each other's memory. So, we can't just
stick the taginfo file in an ELF section and have the PEX read
it, because the ELF file is over in AP memory. Our solution was
a neat hack: At boot time, we set up a big array in memory, let's
call it tag_array. tag_array contained all 0s, but each
0 had different metadata. Every possible metadata tag was
applied to one location in tag_array.

Now in the .dover_metadata section of the ELF file, we record
the metadata for each instruction as a number. We add a little
code to the ELF loader: when an instruction is loaded to some
address x, we find the number corresponding to this instruction
in the .dover_metadata section. Suppose the number is foo.
Then we do:

*x = *x | tag_array[foo]

This doesn't change the value at x because tag_array contains
all 0s. However, we have a special policy installed that notices
when this bit of the loader executes, and it copies the metadata
over from tag_array[foo] over to x. In this way, the
PEX "notices" the loader is running and sets up the metadata for
the dynamically loaded program appropriately.

This is a hack, but it doesn't have much performance impact and
allows us to set up metadata for dynamically loaded applications
without fundamentally changing the interface between the AP and
the PEX.

Going forward

We don't currently support dynamic loading, but getting BSD
working is on our near term roadmap (late 2019/early 2020). So,
when we add back support for arbitrary metadata in the
.dover_metadata section, I think it's important that we do it
in a way that is sufficiently general to support both the use
case you've identified and this dynamic loading use case.

Notice that our loader hack fundamentally requires out-of-band
agreement between whoever created the ELF file and the PEX
kernel, since the PEX kernel needs to set up the tag_array at
boot time and needs a fixed unique integer for each possible tag.
For this reason, it's not obvious to me that it's possible to
completely do away with some coordination here. That said, I'm
not opposed to a cleaner .dover_metadata section.

We plan to get to this work in a few months. If you guys need it
before then, we're happy to take patches, with the caveat that we
don't want to introduce big .dover_metadata changes that we're
going to have to redesign again when we add back the dynamic loading.

Writing up some notes from our in person meeting on last Friday. This is my takeaway, which is probably erroneous. It sounds like this issue is being postponed for the time being, but this should help continue the conversation where we left off.

It seems the consensus was that there are two orthogonal features that could be served with an ELF metadata section: 1) a symbolic description of what tags should be applied on what code/data and 2) a concrete data structure of these tags that is suitable for loading by the validator or dynamic loader.

The first, the symbolic section, would serve the need of the would be emitted by the compiler (or other binary analysis tools) to describe which functions, which instructions, or which data structures should receive what tags. This would potentially involve the tag names as strings. The second, concrete section would be emitted after linking (potentially by a separate utility) when producing a final executable.

Another focus of discussion was preserving maintainability by only having one source of truth for the mapping of symbolic tag strings to concrete integers usable by the pex/validator. We discussed having the policy-tool compiler generate this mapping in some sort of configuration file as an artifact of the .dpl policy description.