System calls
Opened this issue · 5 comments
Depending on a specific target, we may want to use system calls.
However, we have to type them else they would be unusable in N*.
We propose a syntax element syscall <N> : <type>
to type a specific syscall (identified by N
).
System calls are a bit harder to handle than what was currently thought.
Let's take the example of Linux' open
(2) system call:
- Inputs:
eax = 0x02
ebx : s32 = flags
ecx : umode_t = opening mode (read, write, etc.)
- Outputs:
eax : s32 = file descriptor
This is impossible to model with only a single type
1, and we also want the compiler to insert mv 0x02, %r0
when trying to syscall 2
(otherwise it would be redundant).
In order to model the inputs and outputs, we can parse structures akin to the typechecker's contexts (in typing rules, the context is denoted Ξ; Γ; χ; σ; ε
and contains all information we need) and use those within the typechecker2.
As for the instruction to automatically generate, we will support macros expansion, all non-terminal N⋆ instructions as well as a special interrupt N
instruction3.
As such, we propose this syntax for syscall
s:
non-terminal-instructions := instruction ";" non-terminal-instructions | instruction
context := "(" XI ";" GAMMA ";" CHI ";" SIGMA ";" EPSILON ")"
syscall := "syscall" number ":" context "->" context "=" non-terminal-instructions
where:
XI
andGAMMA
are comma-separated liists of bindingslabel: type
CHI
is a comma-separated list of bindingsregister: type
SIGMA
is a stack typeEPSILON
is a continuation type
In the typechecker, each part of the context will be unified with the internal context and used as such. System calls modify the internal context as specified by context → context
. The first context is the precontext, meaning it describes all the inputs needed for the syscall to operate correctly, and the second context is the postcontext, meaning it describes what the syscall gave us back.
One may bind a register to !
in the postcontext to allow forgetting, or in the precontext to mark callee-saved registers.
Footnotes
-
It is possible, and this is what is done with labels and continuations, but this is not great. System calls do not jump around (they would need to be terminal instructions, which is a bit unsatisfactory), and having continuations for system calls does not seem to quite make sense. ↩
-
We will need to check the consistency of the rules. A simple rule of thumb is that all variables on the left of the arrow must also appear on the right, and no variable on the right is not present on the left. This prevents the user from writing the rule
(Ξ; Γ; χ; σ; ε) → (Ξ'; Γ'; χ'; σ'; ε')
where all the primed versions are fresh identifiers not bound on the left of the arrow. ↩ -
The
interrupt N
instruction exists in order to generate the correct code for theINT
(orINT
-like) instructions. These codes depend on the kernel, not the CPU, so we cannot generate them when compiling N⋆ instructions to machine code. We could also have another syntactic construction to specify whichINT
code to use for system calls (something likesyscall → X
?). Further system call definitions would then be augmented by an implicitinterrupt X
. ↩
See https://syscalls32.paolostivanin.com/, https://syscalls64.paolostivanin.com/ or https://syscall.sh/ for system call codes and arguments on Linux.
Note that all system calls return an integer value in %rax
(but we may choose to discard it).
The
interrupt N
instruction exists in order to generate the correct code for the INT (or INT-like) instructions. These codes depend on the kernel, not the CPU, so we cannot generate them when compiling N⋆ instructions to machine code. We could also have another syntactic construction to specify which INT code to use for system calls (something likesyscall → X
?). Further system call definitions would then be augmented by an implicitinterrupt X
.
Instead of this, and putting syscall
definitions at the top-level, we can unify both in a “section” containing the interrupt number.
Something like
syscall 0x80 {
# exit
syscall 60: ... → ... = ...
}
That way, there's no problem related to forgotten syscall → N
declarations. And we also gain that the N
is now scoped (well we should disallow multiple syscall
sections with different N
s).
There is no need to repeat syscall
s within the block (because the block itself is only for syscall declarations).