This repository contains a VERY BASIC x86-64 assembler, which is capable of reading assembly-language input, and generating a staticly linked ELF binary output.
It is more a proof-of-concept than a useful assembler, but I hope to take it to the state where it can compile the kind of x86-64 assembly I produce in some of my other projects.
Currently the assembler will generate a binary which looks like this:
$ file a.out
a.out: ELF 64-bit LSB executable, x86-64, version 1 (SYSV)
statically linked, no section header
Why? I've written a couple of toy projects that generate assembly language programs, then pass them through an assembler:
The code in this repository was born out of the process of experimenting with generating an ELF binary directly. A necessary learning-process.
We don't support anywhere near the complete instruction-set which an assembly language programmer would expect. Currently we support only things like this:
add $REG, $REG
+add $REG, $NUMBER
- Add a number, or the contents of another register, to a register.
call $LABEL
- See call.asm for an example.
dec $REG
- Decrement the contents of the specified register.
- We also support indirection, so the following work:
inc byte ptr [$REG]
inc word ptr [$REG]
inc dword ptr [$REG]
inc qword ptr [$REG]
inc $REG
- Increment the contents of the specified register.
- We also support indirection, so the following work:
inc byte ptr [$REG]
inc word ptr [$REG]
inc dword ptr [$REG]
inc qword ptr [$REG]
jmp $LABEL
,je $LABEL
,jne $LABEL
- We support jumping instructions, but only with -127/+128 byte displacements
- See jmp.asm for a simple example.
mov $REG, $NUMBER
mov $REG, $REG
- Move a number into the specified register.
nop
- Do nothing.
push $NUMBER
, orpush $IDENTIFIER
ret
- Return from call.
- NOTE: We don't actually support making calls, though that can be emulated via
push
- see jmp.asm for an example.
sub $REG, $REG
+sub $REG, $NUMBER
- Subtract a number, or the contents of another register, from a register.
xor $REG, $REG
- Set the given register to be zero.
int $NUM
- Call the kernel.
- Processor (flag) control instructions:
clc
,cld
,cli
,cmc
,stc
,std
, andsti
.
Note that we really only support the following registers, you'll see that we only support the 64-bit registers (which means rax
is supported but eax
, ax
, ah
, and al
are specifically not supported):
rax
rcx
rdx
rbx
rsp
rbp
rsi
rdi
There is some support for the extended registers r8
-r15
, but this varies on a per-instruction basis and should not be relied upon.
There is support for storing fixed-data within our program, and locating that. See hello.asm for an example of that.
We also have some other (obvious) limitations:
- There is notably no support for comparison instructions, and jumping instructions.
- We emulate (unconditional) jump instructions via "
push
" and "ret
", see jmp.asm for an example of that.
- We emulate (unconditional) jump instructions via "
- The entry-point is always at the beginning of the source.
- You can only reference data AFTER it has been declared.
- These are added to the
data
section of the generated binary, but must be defined first. - See hello.asm for an example of that.
- These are added to the
If you have this repository cloned locally you can build the assembler like so:
cd cmd/assembler
go build .
go install .
If you wish to fetch and install via your existing toolchain:
go get -u github.com/skx/assembler/cmd/assembler
You can repeat for the other commands if you wish:
go get -u github.com/skx/assembler/cmd/lexer
go get -u github.com/skx/assembler/cmd/parser
Of course these binary-names are very generic, so perhaps better to work locally!
Build the assembler:
$ cd cmd/assembler
$ go build .
Compile the sample program, and execute it showing the return-code:
$ cmd/assembler/assembler test.asm && ./a.out ; echo $?
9
Or run the hello.asm example:
$ cmd/assembler/assembler hello.in && ./a.out
Hello, world
Goodbye, world
You'll note that the \n
character was correctly expanded into a newline.
The core of our code consists of a small number of simple packages:
- A simple tokenizer lexer/lexer.go
- A simple parser parser/parser.go
- This populates a simple internal-form/AST parser/ast.go.
- A simple compiler compiler/compiler.go
- A simple elf-generator elf/elf.go
- Taken from vishen/go-x64-executable.
In addition to the package modules we also have a couple of binaries:
cmd/lexer
- Show the output of lexing a program.
- This is useful for debugging and development-purposes, it isn't expected to be useful to end-users.
cmd/parser
- Show the output of parsing a program.
- This is useful for debugging and development-purposes, it isn't expected to be useful to end-users.
- Show the output of parsing a program.
cmd/assembler
- Assemble a program, producing an executable binary.
These commands located beneath cmd
each operate the same way. They each take a single argument which is a file containing assembly-language instructions.
For example here is how you'd build and test the parser:
cd cmd/parser
go build .
$ ./parser ../../test.asm
&{{INSTRUCTION xor} [{REGISTER rax} {REGISTER rax}]}
&{{INSTRUCTION inc} [{REGISTER rax}]}
&{{INSTRUCTION mov} [{REGISTER rbx} {NUMBER 0x0000}]}
&{{INSTRUCTION mov} [{REGISTER rcx} {NUMBER 0x0007}]}
&{{INSTRUCTION add} [{REGISTER rbx} {REGISTER rcx}]}
&{{INSTRUCTION mov} [{REGISTER rcx} {NUMBER 0x0002}]}
&{{INSTRUCTION add} [{REGISTER rbx} {REGISTER rcx}]}
&{{INSTRUCTION int} [{NUMBER 0x80}]}
This is how you might add a new instruction to the assembler, for example you might add jmp 0x00000
or some similar instruction:
- Add a new entry for the instruction in instructions/instructions.go
- i.e. Update
InstructionLengths
map to add the instruction. - This will be used by both the tokenization process, and the parser.
- i.e. Update
- Generate the appropriate output in
compiler/compiler.go
, inside the functioncompileInstruction
.- i.e. Emit the binary-code for the instruction.
Launch the binary under gdb:
$ gdb ./a.out
Start it:
(gdb) starti
Starting program: /home/skx/Repos/github.com/skx/assembler/a.out
Program stopped.
0x00000000004000b0 in ?? ()
Dissassemble:
(gdb) x/5i $pc
Or show string-contents at an address:
(gdb) x/s 0x400000
Feel free to report, as this is more a proof of concept rather than a robust tool they are to be expected.
Specifically we're missing support for many instructions, but I hope the code generated for those that is present is correct.
Steve