System calls

Question

System calls

Opened this issue 3 years ago · 5 comments

Depending on a specific target, we may want to use system calls.
However, we have to type them else they would be unusable in N*.

We propose a syntax element syscall <N> : <type> to type a specific syscall (identified by N).

Answer 1 · 2022-09-25T08:58:19.000Z

System calls are a bit harder to handle than what was currently thought.

Let's take the example of Linux' open (2) system call:

Inputs:
- eax = 0x02
- ebx : s32 = flags
- ecx : umode_t = opening mode (read, write, etc.)
Outputs:
- eax : s32 = file descriptor

This is impossible to model with only a single type¹, and we also want the compiler to insert mv 0x02, %r0 when trying to syscall 2 (otherwise it would be redundant).
In order to model the inputs and outputs, we can parse structures akin to the typechecker's contexts (in typing rules, the context is denoted Ξ; Γ; χ; σ; ε and contains all information we need) and use those within the typechecker².
As for the instruction to automatically generate, we will support macros expansion, all non-terminal N⋆ instructions as well as a special interrupt N instruction³.

As such, we propose this syntax for syscalls:

non-terminal-instructions := instruction ";" non-terminal-instructions | instruction
context := "(" XI ";" GAMMA ";" CHI ";" SIGMA ";" EPSILON ")"
syscall := "syscall" number ":" context "->" context "=" non-terminal-instructions

where:

XI and GAMMA are comma-separated liists of bindings label: type
CHI is a comma-separated list of bindings register: type
SIGMA is a stack type
EPSILON is a continuation type

In the typechecker, each part of the context will be unified with the internal context and used as such. System calls modify the internal context as specified by context → context. The first context is the precontext, meaning it describes all the inputs needed for the syscall to operate correctly, and the second context is the postcontext, meaning it describes what the syscall gave us back.
One may bind a register to ! in the postcontext to allow forgetting, or in the precontext to mark callee-saved registers.

It is possible, and this is what is done with labels and continuations, but this is not great. System calls do not jump around (they would need to be terminal instructions, which is a bit unsatisfactory), and having continuations for system calls does not seem to quite make sense. ↩
We will need to check the consistency of the rules. A simple rule of thumb is that all variables on the left of the arrow must also appear on the right, and no variable on the right is not present on the left. This prevents the user from writing the rule (Ξ; Γ; χ; σ; ε) → (Ξ'; Γ'; χ'; σ'; ε') where all the primed versions are fresh identifiers not bound on the left of the arrow. ↩
The interrupt N instruction exists in order to generate the correct code for the INT (or INT-like) instructions. These codes depend on the kernel, not the CPU, so we cannot generate them when compiling N⋆ instructions to machine code. We could also have another syntactic construction to specify which INT code to use for system calls (something like syscall → X?). Further system call definitions would then be augmented by an implicit interrupt X. ↩

Answer 2 · 2022-09-25T09:11:36.000Z

In a sense, syscalls as presented in the comment above are like (unsafely¹) typed macros.

Unsafely because interrupt does not have a clear type, and also because these macros' types do not live in the grammar itself. ↩

Answer 3 · 2022-09-25T10:52:47.000Z

See https://syscalls32.paolostivanin.com/, https://syscalls64.paolostivanin.com/ or https://syscall.sh/ for system call codes and arguments on Linux.
Note that all system calls return an integer value in %rax (but we may choose to discard it).

Answer 4 · 2022-09-26T10:06:55.000Z

The interrupt N instruction exists in order to generate the correct code for the INT (or INT-like) instructions. These codes depend on the kernel, not the CPU, so we cannot generate them when compiling N⋆ instructions to machine code. We could also have another syntactic construction to specify which INT code to use for system calls (something like syscall → X?). Further system call definitions would then be augmented by an implicit interrupt X.

Instead of this, and putting syscall definitions at the top-level, we can unify both in a “section” containing the interrupt number.
Something like

syscall 0x80 {
  # exit
  syscall 60: ... → ... = ...
}

That way, there's no problem related to forgotten syscall → N declarations. And we also gain that the N is now scoped (well we should disallow multiple syscall sections with different Ns).

Answer 5 · 2022-09-26T11:01:01.000Z

There is no need to repeat syscalls within the block (because the block itself is only for syscall declarations).

Footnotes

Footnotes