Redesign

Question

Redesign

npmccallum opened this issue 3 years ago · 2 comments

npmccallum commented 3 years ago

🚧🚧🚧 WORK IN PROGRESS 🚧🚧🚧

Problems with the Current Implementation

As it stands today, sallyport works but suffers from a number of shortcomings.

Only a single syscall can be proxied to the host per guest exit. This means that every syscall proxied to the host has to pay the full penalty. However, it would also be desirable to support batching or "smuggling" (i.e. implicitly conjoining two unrelated syscalls). Batching isn't as desirable currently. However, smuggling can be used to perform time related syscalls to update an internal map between an instruction counter and the system time. This would dramatically decrease the time spent for each keep current time request.
There is no way to tell, from the sallyport itself, if the sallyport contains valid syscalls to execute during a keep exit. This means that the main loops have to divine from other strategies whether the contents of the sallyport are valid to process. This is not good for safety.
Having an allowlist of syscalls for the host side is currently left up to the keep loader backends. These currently allow all syscalls. It would be better if sallyport would know all its implemented syscalls and only permit the ones it knows about.
Address translation is currently left up to the backend to implement. This results in strange code such as in kvm where the guest has knowledge about the host addresses in order to translate the addresses correctly. This is terrible for security since it means that the host would have to validate that all of the addresses are safe.

Design Requirements

Like before, sallyport needs a defined mechanism for holding both data viewable by the host as well as a facility for passing syscall requests.
Sallyport should support multiple syscalls at once in the sallyport.
The host should be able to inspect sallyport memory in a safe way to determine how many syscalls are being requested.
Sallyport should encode all syscall pointers as an offset in the sallyport buffer. The guest knows its address for the sallyport, so it can convert this offset to an address. Likewise, the host knows its address for the sallyport, so it too can convert this offset to an address. This also makes pointer validation easy since pointers are only valid if they point into the sallyport.
Sallyport should provide host and guest interfaces for interacting with the sallyport. These interfaces should be rich and have knowledge of how to serialize each syscall. This allows us to effectively share syscall proxying code between technologies.

Data Types

This section will define the data types in the sallyport system. It will use traits or type aliases as a common way to express how the data types should look. However, this does not imply that these should *actually be traits or type aliases unless otherwise noted. Often they will just be structs that have implemented methods with similar signatures to the proposed trait.

Word

A word is a pointer-sized integer. This is simply another way of expressing usize.

Block

  ┌──────┐
  │ Size │
  ├──────┤
  │      │
  │ Data │
  │  ... │
  │      │
  ├───┬──┤
  │   │  │
  │   ▼  │
  │  ▲   │
  │  │   │
  ├──┴───┤
  │ Reps │
  ├──────┤
  │      │
  │ Reqs │
  │      │
  └──────┘
   Fig. A

A sallyport block is an array of words (i.e. &[usize]). Expressing the block in this way solves both alignment and code simplicity issues. However, this is really just a region of memory. (As you will see, when we allocate space for data in the block we will do .align_to::<u8>(). So this is really just a block of bytes expressed as words.

The full form of the block is show in Figure A above. At the top of the block is the size word which identifies how many request/response pairs there are at the bottom of the block. Below the size word is a region reserved for storing data. This data can be referenced in requests or responses by expressing the offset into the block for the data in question.

Note that during block content construction, the data and request/reply sections grow towards each other. Care MUST be taken to ensure these sections never overlap.

Request

type Request = [usize; 8];

A request is simply eight contiguous words. The first word defines the contents of the remaining 7 words.

SysCall

const SYSCALL: usize = 0;

If the first word contains a SYSCALL value, the second word contains a syscall number and the remaining six words contain the platform's syscall registers in the platform defined order. For example, on x86_64-unknown-linux-gnu the format of the request would be:

request[0] == SYSCALL;
request[1] == rax;
request[2] == rdi;
request[3] == rsi;
request[4] == rdx;
request[5] == r10;
request[6] == r8;
request[7] == r9;

All registers which contain pointer values MUST be expressed as an offset in the sallyport to the data rather than an absolute address.

Response

type Response = [usize; 2];

Each request is paired with its corresponding response. These can be correlated by their index (the first response correlates with the first request). The formatting of the response is platform and request type dependent. For example, on x86_64-unknown-linux-gnu the response words are rax and rdx, respectively, and -errno values are passed as the highest 4096 values in rax.

Phases

The block has three states which correspond with three phases:

Start - the guest begins forwarding requests to the host
Exit - the host receives control from the guest
Return - the guest receives control back from the host

We will define the contents of the sallyport at each state and then explain what the correlated phase of code must do to transition to the next state.

Start Phase

When the guest is ready to forward requests to the host, the sallyport contents are undefined. The guest begins by writing a 0 to the size word. The block is now empty.

For each request to be forwarded over the sallyport, the guest should:

Allocate and write any data needed to the data section.
Append a new request to the request section.
Increment size.

Once all data and requests have been written, the guest exits to the host.

Care MUST be taken by the guest to ensure that the data and request/response sections never overlap. This implies that the guest MUST leave sufficient space between the data and request sections for the host to write size responses.

Exit Phase

When the host receives control from the guest, the host does not have knowledge of the contents of the data section. It only knows that the bottom of the block contains size number of requests.

After the host sanity checks the size value, it should call each syscall in order and write each response to the block. The host MUST NOT call the syscalls naively. It needs to evaluate all "pointer" types in the registers (expressed as offsets in the block) to ensure they point to the data region before converting them to proper host addresses. Then the host should perform any other syscall-specific validation before calling the syscall.

If an unknown syscall is requested, the host should respond ENOSYS.

Return Phase

When the host returns control to the guest, the guest MUST NOT presume that any sallyport values are valid. Therefore, it MUST sanity check all the input values. For example, the guest should first validate that the size value is unmodified. Likewise, it should validate that the requests section is unmodified.

After validation, the guest should iterate through request/response pairs, continuing to validate all nested input data. For example, if a "pointer" type isn't expressed as a valid offset in the sallyport, the guest should immediately stop all further execution.

Guest Side

Platform

trait Platform {
	/// Suspend guest execution and pass control to host.
	/// This function will return when the host passes control back to the guest.
    pub fn sally(&mut self) -> Result<(), c_int>;

	/// Validates that a region of memory is valid.
	/// Returns a pointer if valid, otherwise `EINVAL`.
    pub fn validate<T: Copy>(&self, ptr: usize, len: usize) -> Result<*const T, c_int>;
}

This is an actual trait. This is what we need each technology (i.e. kvm, sgx) to implement.

Handler

struct Handler(...);

impl Handler {
	/// Create a new `Handler`.
    pub fn new(block: &[usize], platform: impl Platform) -> Self;
    
    pub fn attacked(&mut self) -> ! {
		// Loop in case the host tries to reenter
		loop {
			// Try to exit...
			self.exit(1);
		}
    }

    /// Takes in the syscall registers, constructs the relevant
    /// data types from them and calls the correct method below.
    ///
	/// # Safety
	///
	/// This method is unsafe because it interprets registers to
	/// the correct data types. However, in actual implementation
	/// it might be safe if we can validate the inputs.
    unsafe fn syscall(&mut self, registers: [usize; 7]) -> Result<[usize; 2], c_int> {
    	match registers[0] {
			libc::SYS_read => {
				let fd = registers[1] as _;
				let ptr = self.platform.validate(registers[2], registers[3])?;
				let buffer = from_raw_parts_mut(ptr, registers[3]);
				[self.read(fd, buffer)?, 0]
			}
			
			...
		}
    }

	/// Execute a read syscall...
    pub fn read(&mut self, fd: c_int, buffer: &mut [u8]) -> Result<usize, c_int>() {
		// Allocate buffer.len() bytes in the data section.
		let offset_in_block = self.allocate(buffer.len());

		// Append request
		self.append(&[SYSCALL, libc::SYS_read, fd, offset_in_block, buffer.len()]);
		
		self.leave();
		
		// Validate return value
		let responses = self.responses().collect();
		if responses[0] > buffer.len() {
			self.attacked()
		}

		return response_to_result(response[0]);
    }

	/// Other syscall methods...
	pub fn write(&mut self, fd: c_int, buffer: &[u8]) -> Result<usize, c_int>();
	...
}

The Handler instance is the guest's interface with sallyport. The guest can execute a syscall directly. Or it can use the convenience method syscall() to pass raw registers in from a syscall.

Answer 1 · 2021-12-14T18:08:07.000Z

This is the redesign of the sallyport block we finalized today. It is only minor changes from Roman's newest PR.

===============================================

The sallyport block is a region of memory containing zero or more items. All items contain the following header:

size: usize
kind: usize

The size parameter includes the full length of the item except the header value. The contents of the item are defined by the value of the kind parameter. An item with an unknown kind can be skipped since the length of the item is known from the size field. The recipient of an item with an unknown kind MUST NOT try to interpret or modify the contents of the item in any way.

Kinds

END: 0
SYSCALL: 1
...

End

An END item MUST have a size of 0. It has no contents and simply marks the end of items in the block. This communicates the end of the items list to the host. However, the guest MUST NOT rely on the presence of a terminator upon return to the guest.

Syscall

A SYSCALL item has the following contents:

nmbr: usize - the syscall number
arg0: usize - the first argument
arg1: usize - the second argument
arg2: usize - the third argument
arg3: usize - the fourth argument
arg4: usize - the fifth argument
arg5: usize - the sixth argument
ret0: usize - the first return value
ret1: usize - the second return value
data: ... - data that can be referenced (optional)

The argument values may contain numeric values. However, all pointers MUST be translated to an offset from the beginning of the data section.

Answer 2 · 2021-12-15T13:25:33.000Z

It is only minor changes from Roman's newest PR.

It is a minor change in terms of the protocol, but it raises complexity of the implementation quite a bit on both sides, so requires more work to be done (or unsafe workarounds).